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EXECUTIVE  SUMMARY 


This  report  examines  the  effects  of  the  Health  Care  Financing  Administration's  (HCFA's)  Per- 
Episode  Home  Health  Prospective  Payment  Demonstration  on  the  quality  of  home  health  care 
throughout  the  three  years  of  the  demonstration.  It  provides  an  update  to  our  previous  report  on  the 
quality  of  care  in  the  demonstration,  which  covered  the  first  two  years  of  the  demonstration 
(Chen  2000).  The  demonstration  tests  the  extent  to  which  a  fixed,  lump-sum  prospective  payment 
to  home  health  agencies  for  the  first  120  days  of  each  episode  of  care  provided  to  Medicare 
beneficiaries  increases  efficiency  in  service  provision.  By  allowing  agencies  to  retain  most  of  any 
surplus  payments  over  cost,  this  payment  method  gives  agencies  an  incentive  to  provide  home  health 
care  in  a  cost-efficient  manner. 

Our  main  findings  are  that,  despite  prospectively  paid  agencies'  progressive  reduction  of 
services  over  time,  there  were  no  serious  adverse  effects  on  major  outcomes  of  care.  We  did  find 
a  few  small  negative  effects  in  Year  3  on  stabilization  in  functioning,  as  well  as  several  large  positive 
effects  in  all  three  years  on  improvement  in  symptoms.  These  could  be  viewed  either  as  subtle  signs 
of  negative  effects  late  in  the  demonstration  or,  just  as  easily,  as  the  absence  of  any  true  impacts. 

BACKGROUND 

Ninety-one  Medicare-certified  home  health  agencies  in  five  states—California,  Florida,  Illinois, 
Massachusetts,  and  Texas—enrolled  in  the  three-year,  per-episode  demonstration.  Forty-eight  were 
randomly  assigned  to  prospective  payment  (the  "treatment,"  or  "intervention,"  group).  The 
remaining  43  were  assigned  to  the  control  group,  which  continued  to  operate  under  cost 
reimbursement.  Each  agency  entered  the  demonstration  at  the  start  of  its  fiscal  year.  The  first 
agencies  entered  the  demonstration  in  June  1995,  and  the  last  ones  in  January  1996. 

Agencies  assigned  to  the  treatment  group  received  a  lump-sum  payment  for  the  first  120  days 
of  home  health  care,  regardless  of  the  number  or  cost  of  visits  provided.  The  agencies  were  thus  "at 
risk"  for  the  costs  of  care  incurred  during  this  period.  Only  after  the  1 20-day  at-risk  period,  and  after 
a  45-day  gap  in  services  had  elapsed,  was  an  agency  able  to  receive  a  new  per-episode  payment  for 
a  given  Medicare  beneficiary.  For  each  visit  beyond  120  days  that  did  not  begin  a  new  episode 
(referred  to  as  the  "outlier  period"),  treatment  agencies  received  a  fixed  payment  rate  that  varied  by 
the  type  of  visit. 

Prospective  payments  for  the  at-risk  period  were  based  on  a  treatment  agency's  costs,  episode 
profile,  and  case  mix  in  the  fiscal  year  preceding  its  entry  into  the  demonstration  (the  base  year), 
adjusted  for  inflation  and  changes  in  case  mix  in  each  demonstration  year.  HCFA  also  shared  in 
treatment  agencies'  profits  and  losses  above  and  below  certain  levels.  The  profit-sharing 
arrangement  was  meant  to  counteract  the  incentive  to  reduce  services  dramatically  at  the  expense 
of  quality,  as  well  as  prevent  agencies  from  realizing  excessive  profits  at  public  expense.  The  loss- 
sharing  arrangement  was  meant  to  encourage  agencies  to  participate  in  the  demonstration  by 
minimizing  their  "downside  risk." 
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Agencies  assigned  to  the  control  group  continued  to  be  paid  under  the  existing  cost- 
reimbursement  system.  Payments  were  based  on  agencies'  actual  per- visit  costs,  up  to  112  percent 
of  the  mean  cost  incurred  by  all  agencies  (for  the  agency's  mix  of  visits)  in  the  same  geographic  area. 

Our  previous  analyses  of  the  early  part  of  the  demonstration  found  that  prospective  payment 
generated  large  reductions  in  service  use  without  demonstrable  impacts  on  quality  of  care.  Medicare 
service  use,  and  non-Medicare  service  use.  We  recently  submitted  a  follow-up  report  on  service  use 
that  used  data  from  all  three  years  of  the  demonstration.  In  this  report  we  found  that  both  treatment 
and  control  agencies  continued  to  reduce  both  the  nimiber  of  visits  they  provided  and  the  length  of 
patient  episodes  in  the  third  demonstration  year,  resulting  in  large  and  stable  impacts  on  service  use 
throughout  the  demonstration. 

RESEARCH  QUESTIONS  AND  METHODS 

Our  basic  research  question  was  whether,  given  the  steady  and  progressive  service  reductions 
over  time,  patients  of  prospectively  paid  agencies  experience  outcomes  significantly  different  from 
those  of  patients  of  cost-reimbursed  agencies  within  each  demonstration  year.  We  constructed 
multiple  outcomes  measures  from  two  sources:  (1)  Medicare  claims  and  enrollment  data,  and  (2) 
demonstration  Quality  Assurance  (QA)  data.  We  studied  the  following  outcomes: 

Functional  Status.  Whether  the  patient  improved  or  stabilized  in  each  of  several  Basic 
Activities  of  Daily  Living  (BADLs)  and  Instrumental  Activities  of  Daily  Living 
(lADLs).  BADLs  were  grooming,  bathing,  toileting,  transferring,  and  ambulating. 
lADLs  were  preparing  light  meals,  housekeeping,  and  managing  oral  medications. 

Clinical  Symptoms  and  Mortality.  Whether  the  patient  improved  or  stabilized  in  each 
of  several  clinical  symptoms,  and  whether  the  patient  had  died  during  the  at-risk  period. 
Symptom  measures  were  pain,  pressure  ulcers,  wound  status,  dyspnea,  urinary 
incontinence  or  catheter  present,  confusion,  and  behavioral  problems. 

Use  of  Emergency  Health  Care.  Whether  the  patient  had  an  emergency  visit  to  a 
hospital  emergency  room,  physician's  office,  or  outpatient  clinic  during  the  at-risk 
period. 

4.  Use  of  Medicare  Part  A  Covered  Health  Services.  Whether  the  patient  was  admitted 
during  the  at-risk  period  to  a  hospital,  skilled  nursing  facility,  or  home  health  care 
agency  for  a  "same  body  system  diagnosis"  (a  diagnosis  involving  the  same  general 
problem  as  the  original  home  health  care  admitting  diagnosis). 

Patients  were  the  units  of  analysis.  We  used  logit  models  that  included  indicator  variables  for 
agency  treatment  status  and  for  interactions  between  the  treatment-control  status  and  the 
demonstration  year.  The  models  also  controlled  for  any  residual  differences  between  treatment  and 
control  agencies  and  their  patients  despite  random  assignment  of  agencies.  Patient  characteristics 
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were  drawn  from  Medicare  claims  and  enrollment  data,  QA  data,  and  case-mix  adjustment  data. 
Agency  characteristics  were  drawn  from  agency  cost  reports,  data  from  the  demonstration 
implementation  contractor,  and  the  Area  Resource  File  (ARF).  We  also  performed  a  subgroup 
analysis  to  detect  any  effect  of  prospective  payment  by  whether  the  agency  was  a  high-  or  low-use 
agency  (as  defined  by  the  agency's  baseline  practice  pattern). 

Observations  were  weighted  to  give  each  agency  equal  representation  in  the  analysis.  The 
estimated  standard  errors  of  the  treatment-control  differences  accounted  for  the  effects  of  weighting 
and  wdthin-agency  clustering.  We  also  performed  sensitivity  analyses  to  assess  the  robustness  of  the 
estimates  to  alternative  weighting  schemes  and  specifications  of  the  sample. 

FINDINGS 

In  the  face  of  continued  service  cuts,  prospective  payment  had  no  adverse  impacts  on  major 
outcomes  in  any  of  the  three  years. 

There  were  no  impacts  on  the  major  outcomes  of  mortality,  emergency  care,  and  admissions  to 
hospitals,  nursing  homes,  or  home  health  care  agencies. 

We  found  conflicting  findings  of  questionable  importance  in  measures  of  functioning  and 
clinical  symptoms. 

In  Year  3  only,  we  found  small  negative  effects  (unfavorable  to  treatment  agencies),  ranging 
from  two  to  three  percent  of  the  control  group  mean,  on  three  ADL  stabilization  outcomes.  In 
contrast,  we  found  large  positive  effects  (favorable  to  treatment  agencies),  ranging  from  9  to  1 8 
percent  of  the  control  group  mean,  on  five  clinical  symptom  improvement  outcomes.  These  positive 
effects  were  present  across  two  or  three  years. 

The  interpretation  of  these  findings  is  unclear.  One  could  view  the  three  small  negative  effects 
in  Year  3  as  subtle  signs  of  problems  late  in  the  demonstration,  when  service  levels  fell  to  their 
lowest.  Or,  one  could  just  as  well  conclude  that  there  were  no  true  impacts  because  of  (1)  the 
inconsistency  of  the  few  isolated  late  negative  effects  on  ADL  stabilization  with  the  absence  of 
effects  on  ADL  improvement,  (2)  the  inconsistency  of  the  numerous  large  positive  effect  on  clinical 
symptom  improvement  with  the  absence  of  any  effects  on  clinical  symptom  stabilization,  and  (3)  the 
lack  of  correlation  between  these  observed  effects  and  the  known  shifts  in  service  during  the  three 
demonstration  years. 
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We  found  no  compelling  evidence  for  subgroup  impacts. 

We  found  a  number  of  statistically  significant,  but  inconsistent,  treatment-control  differences 
from  which  we  could  draw  no  firm  conclusions.  These  differences  occurred  only  among  high-use 
agencies,  and  only  in  certain  functioning  and  clinical  symptom  outcomes.  Similar  to  the  main 
analysis,  the  differences  consisted  of  a  few  small  effects  unfavorable  to  high-use  treatment  agencies, 
and  several  larger  effects  favorable  to  high-use  treatment  agencies.  We  actually  had  anticipated  a 
greater  impact  on  quality  among  low-use  agencies,  given  their  more  aggressive  reduction  of  services, 
but,  in  fact,  no  treatment  effects  appeared  among  the  low-use  agencies.  Although  there  was 
insufficient  evidence  for  concluding  that  prospective  payment  truly  affected  high-use  agencies  any 
differently  from  low-use  agencies,  it  may  be  wise  to  pay  special  attention  to  the  quality  of  care  of 
high-use  agencies  as  prospective  payment  is  implemented. 

STUDY  STRENGTHS  AND  LIMITATIONS 

The  study  had  numerous  design  strengths. 

The  study  relied  on  random  assignment  of  agencies  to  prospective  payment  or  cost- 
reimbursement.  A  rich  set  of  control  variables  was  available.  Sample  sizes  for  all  outcomes  were 
large. 

Caution  is  warranted  in  extrapolating  study  results  to  the  new  national  home  health 
prospective  payment  system. 

First,  agency  participation  was  voluntary.  As  a  result,  agencies  in  the  demonstration  were  more 
likely  to  be  large,  full-service  agencies  and  less  likely  to  be  hospital-based.  Since  the  demonstration 
agencies  differed  from  typical  agencies  nationwide,  results  may  not  apply  well  to  a  nationwide 
program  of  prospective  payment.  However,  two  points  suggest  that  our  results  are  generalizable: 
(1)  demonstration  agencies  did  include  a  wide  variety  of  types  of  agencies,  and  (2)  we  found  no 
definite  subgroup  effects  in  this  report  (or  in  our  earlier  report,  Chen  2000). 

Second,  the  newly  implemented  national  home  health  prospective  payment  system  differs 
substantially  from  the  one  in  the  demonstration.  The  new  system  does  not  base  agencies'  payments 
on  historic  costs  per  episode,  nor  does  it  provide  loss  protection.  These  features  may  prompt 
agencies  to  cut  service  use  even  more  aggressively  than  in  the  demonstration,  with  potentially 
deleterious  consequences  for  quality  of  care. 

Third,  the  demonstration  was  time-limited.  Agencies  may  respond  to  a  permanent  payment 
system  differently  from  the  way  they  respond  to  a  temporary  one.  The  duration  of  follow-up  time 
on  the  agencies  was  also  limited,  and  quality  effects  might  not  have  appeared  until  the  system  had 
been  in  place  for  a  longer  period  of  time. 
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POLICY  IMPLICATIONS 


There  is  a  substantial  margin  of  safety. 

Considerable  reductions  in  service  use  are  possible  without  major  decrements  in  quality  of  care. 
Even  as  agencies  made  progressive  and  substantial  cuts  in  service  throughout  the  demonstration,  no 
adverse  effects  appeared  in  any  of  a  wide  variety  of  outcomes  during  the  first  two  years  of  the 
demonstration.  Not  until  Year  3  do  weak  hints  of  potential  negative  effects  appear,  and,  as 
discussed,  it  is  debatable  whether  they  even  represent  an  impact  of  prospective  payment. 

The  findings  underscore  the  importance  of  continued  monitoring  of  quality  of  care  under 
prospective  payment. 

Despite  our  uncertainty  over  the  presence  of  any  impacts  on  quality  of  care,  we  believe  the  most 
prudent  stance  is  to  view  the  small  negative  effects  as  a  cautionary  reminder.  They  underscore  the 
importance  of  HCFA's  efforts  to  implement  ongoing  programs  of  quality  assurance  under  the  new 
home  health  prospective  payment  system. 
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I.  THE  PER-EPISODE  HOME  HEALTH  DEMONSTRATION  AND  EVALUATION 


The  Per-Episode  Home  Health  Prospective  Payment  Demonstration  of  the  Health  Care 
Financing  Administration  (HCFA)  tests  the  extent  to  which  prospective  payment  for  Medicare  home 
health  services  increases  efficiency  in  service  provision.  Per-episode  payment  encourages  efficiency 
by  giving  agencies  the  incentive  to  reduce  their  costs.  Specifically,  under  the  demonstration  payment 
system,  any  savings  generated  from  lower  costs  per  episode  of  patient  care  might  result  in  profit  for 
the  agency.  These  incentives  differ  greatly  from  those  under  a  system  of  cost-based  reimbursement 
that  provides  no  rewards  for  cost  containment. 

The  current  report  updates  our  earlier  report  on  the  quality  of  home  health  care,  which  covered 
the  first  two  years  of  the  demonstration.  We  include  additional  data  from  the  third  year  of  the 
demonstration  and  analyze  whether  any  changes  occurred  in  indicators  of  quality  over  time.  This 
additional  analysis  will  provide  valuable  information  on  longer-term  effects  of  prospective  payment 
on  patient  outcomes. 

The  rest  of  this  chapter  presents  a  brief  overview  of  the  Medicare  prospective  payment 
demonstration  and  the  results  of  our  previous  reports.  More  background  information  is  available  in 
our  earlier  reports  (Chen  2000;  Trenhohn  2000a;  Trenholm  2000b;  Schore  2000;  and  Phillips  2000). 

A.   THE  PER-EPISODE  DEMONSTRATION 

HCFA  developed  the  Home  Health  Prospective  Demonstration  to  assess  whether  the  profit 
motive  can  increase  efficiency  in  the  provision  of  Medicare  home  health  care  and  thereby  reduce 
public  expenditures,  without  sacrificing  access  to  care  or  the  quality  of  care.  After  an  initial  test  of 
a  per-visit  payment  system  that  failed  to  reduce  overall  costs,  the  demonstration  moved  on  to  the 
second  phase  of  testing  a  per-episode  payment  system. 
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Ninety-one  Medicare-certified  home  health  agencies  in  five  states—Cahfomia,  Florida,  Illinois, 
Massachusetts,  and  Texas—enrolled  in  the  three-year,  per-episode  demonstration.  Forty-eight  were 
randomly  assigned  to  prospective  payment  (the  "treatment"  or  "intervention"  group).  The  remaining 
43  were  assigned  to  the  control  group,  which  continued  to  operate  under  cost  reimbursement.  Each 
agency  entered  the  demonstration  at  the  start  of  its  fiscal  year.  The  first  agencies  entered  the 
demonstration  in  June  1995  and  the  last  ones  in  January  1996. 

Agencies  assigned  to  the  treatment  group  received  a  lump-sum  payment  for  the  first  120  days 
of  home  health  care,  regardless  of  the  number  or  cost  of  visits  provided.  The  agencies  were  thus  "at 
risk"  for  the  costs  of  care  incurred  during  this  period.  Only  after  the  1 20-day  at-risk  period,  and  after 
a  45 -day  gap  in  services  had  elapsed,  was  an  agency  able  to  receive  a  new  per-episode  payment  ft^r 
a  given  Medicare  beneficiary.  For  each  visit  beyond  120  days  that  did  not  begin  a  new  episode 
(referred  to  as  the  "outlier  period"),  treatment  agencies  received  a  fixed  payment  rate  that  varied  by 
the  type  of  visit. 

The  prospective  payments  for  the  at-risk  period  were  based  on  a  treatment  agency's  costs, 
episode  profile,  and  case  mix  in  the  fiscal  year  preceding  its  entry  into  the  demonstration  (the  base 
year),  adjusted  for  inflation  and  changes  in  case  mix  in  each  demonstration  year.  HCFA  also  shared 
in  treatment  agencies'  profits  and  losses  above  and  below  certain  levels.  The  profit-sharing 
arrangement  was  meant  to  counteract  the  incentive  to  dramatically  reduce  services  at  the  expense 
of  quality,  as  well  as  prevent  agencies  fi-om  realizing  excessive  profits  at  public  expense.  The  loss- 
sharing  arrangement  was  meant  to  encourage  agencies  to  participate  in  the  demonstration  by 
minimizing  their  "downside  risk." 
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Agencies  assigned  to  the  control  group  continued  to  be  paid  under  the  existing  cost- 
reimbursement  system.  Payments  were  based  on  agencies'  actual  per- visit  costs,  up  to  112  percent 
of  the  mean  cost  incurred  by  all  agencies  (for  the  agency's  mix  of  visits)  in  the  same  geographic  area. 

B.  SUMMARY  OF  PREVIOUS  FINDINGS 

Our  earlier  reports  on  the  first  two  years  of  the  demonstration  contained  promising  results  for 
the  demonstration.  In  those  reports,  we  found  substantial  reductions  in  service  provision  by 
prospectively  paid  agencies  (Trenholm  2000).  Despite  these  reductions  in  service,  however,  we 
found  no  important  impacts  on  quality  of  care  (Chen  2000),  patient  access  or  retention  (Trenholm 
2000),  Medicare  costs  (Schore  2000),  or  use  of  non-Medicare  services  (Phillips  2000). 

C.  GUIDE  TO  THE  REST  OF  THIS  REPORT 

The  next  chapter  of  this  report  describes  the  data  sources,  samples,  and  variables  used  to  study 
quality  of  care,  and  the  statistical  methods  used  to  estimate  the  effects  of  the  demonstration.  Chapter 
III  presents  the  results  of  the  analyses,  while  Chapter  IV  summarizes  the  key  findings  and  presents 
our  conclusions.  This  report  is  a  companion  to  the  update  report  on  service  use  by  Archibald  and 
Cheh  (2000),  which  used  data  fi-om  the  third  year  of  the  demonstration  to  study  changes  over  time 
in  demonstration  impacts  on  utilization;  our  results  will  be  presented  in  the  context  of  their  findings. 
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II.  DATA  AND  METHODS 


This  report  presents  analyses  of  two  types  of  data—Medicare  claims  and  enrollment  data,  and 
demonstration  Quality  Assurance  (QA)  data—each  measuring  different  outcomes.'  The  Medicare 
data  measure  mortality  and  use  of  institutional  provider  services  covered  under  Medicare  Part  A 
(hospital,  skilled  nursing  facility  [SNF],  and  home  health  care  services).  The  QA  data  measure  home 
health  agency  staff  assessments  of  patient  function  and  health  services  use. 

A.   CONSTRUCTING  THE  MEDICARE  CLAIMS  DATA  FILE 

We  used  data  from  UB-92  bill  record  files  obtained  from  Palmetto  Government  Benefits 
Administrator  (PGBA),  the  demonstration  fiscal  intermediary  to  identify  home  health  episodes  as 
defined  by  demonstration  rules.  We  used  the  UB-92  bill  records,  rather  than  claims  data,  from  the 
Standard  Analytic  File  (SAP)  because  the  UB-92  data  contain  data  on  patient  characteristics  at 
admission  (collected  for  the  demonstration  case-mix  adjuster),  which  we  then  used  to  construct 
several  key  control  variables.  We  scanned  the  UB-92  files,  beginning  with  each  agency's  enrollment 
in  the  demonstration,  to  identify  the  first  admission  for  an  individual  and  the  complete  set  of  that 
person's  subsequent  bill  records.  The  first  demonstration  admission  determined  the  episode  start 
date  from  which  we  constructed  the  first  record  for  the  patient.  To  determine  the  end  of  the  initial 
episode  and  the  start  of  any  subsequent  episodes,  we  tracked  bill  records  until  we  observed  a  45-day 
gap  in  care  that  began  after  the  end  of  the  at-risk  period  (that  is,  after  the  first  1 20  days  from 
admission).    This  procedure  was  followed  regardless  of  whether  the  agency  discharged  and 

'The  survey  analyzed  in  the  last  report  was  only  administered  to  patients  admitted  to 
demonstration  agencies  between  January  and  August  of  1997.  Survey  data  are  thus  available  only 
for  the  early  and  middle  part  of  agencies'  second  demonstration  years.  No  survey  data  are  available 
for  agencies'  first  or  third  demonstration  years. 
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readmitted  a  patient  within  the  first  165  (120  +  45)  days  after  initial  admission.  If  we  observed  a 
readmission  for  a  patient  after  165  days,  and  a  45-day  gap  in  care  had  taken  place,  we  created  a 
second  record  for  the  patient  corresponding  to  this  second  demonstration  episode.  We  then  repeated 
the  process  until  we  had  constructed,  for  each  patient,  a  series  of  records  for  all  the  episodes 
beginning  in  the  agency  between  its  demonstration  start  and  August  31,1 998.  Since  our  claims  file 
ended  at  December  31,  1998,  we  did  not  include  any  admissions  after  August  31,  1998,  to  ensure 
a  full  120  days  of  follow-up  on  all  observations. 

1.    Claims  Outcome  Variables 

We  merged  the  episodes  identified  from  UB-92  bill  record  data  to  the  Medicare  enrollment 
database  (EDB),  which  provided  information  on  dates  of  death,  and  the  Medicare  SAF,  which 
provided  information  on  Medicare  Part  A  health  services  use.  The  outcomes  variables  measured 
from  this  data  were  mortality  and  "same  body  system  admission"  to  a  hospital,  SNF,  or  home  health 
care  within  1 20  days  after  admission  (see  Table  II.  1 ). 

A  "same  body  system  admission"  was  an  admission  for  a  diagnosis  involving  the  same  body 
system  as  the  condition  for  which  the  patient  was  originally  admitted  to  the  demonstration  home 
health  agency.  The  ICD-9  coding  manual  groups  ICD-9  diagnosis  codes  into  13  body  systems  (for 
example,  "Diseases  of  the  Circulatory  System").  An  admission  to  hospital,  SNF,  or  home  health  care 
was  counted  as  an  outcome  only  if  the  principal  or  first  additional  ICD-9  diagnosis  for  that 
admission  fell  into  the  same  body  system  as  the  principal  or  first  additional  ICD-9  diagnosis  for  the 
original  home  health  admission.  Since  the  earliest  that  prospectively  paid  agencies  could  submit 
another  claim  for  a  new  episode  was  165  days  following  admission,  we  would  not  observe  any 
readmissions  to  treatment  agencies  in  the  120-day  period  after  an  admission.  We  thus  counted  same 
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TABLE  II.  1 


OUTCOME  VARIABLES  FROM  MEDICARE  CLAIMS  AND  ENROLLMENT  DATA 


Death  Within  120  Days  of  Admission  to  a  Demonstration  Agency 


Admission  Within  120  Days  of  Admission  to  a  Demonstration  Agency  for  a  Same  Body 
System  Diagnosis^  to: 

Hospital 

Skilled  Nursing  Facility 
Home  Health  Agency'' 


^Admission  for  a  "Same  Body  System  Diagnosis"  was  an  admission  for  a  diagnosis  involving  the 
same  body  system  as  the  condition  for  which  the  patient  was  originally  admitted  to  the 
demonstration  home  health  agency.  The  ICD-9  coding  manual  groups  ICD-9  diagnosis  codes  into 
13  body  systems  (for  example,  "Diseases  of  the  Circulatory  System").  An  admission  to  hospital, 
SNF,  or  home  health  care  was  counted  as  an  outcome  only  if  the  principal  or  first  additional  ICD-9 
diagnosis  for  that  admission  fell  into  the  same  body  system  as  the  principal  or  first  additional  ICD-9 
diagnosis  for  the  original  home  health  admission. 

''Since  treatment  agencies  could  not  initiate  another  episode  in  the  first  165  days  following  an 
admission,  in  the  initial  165-day  period  for  both  treatment  and  control  episodes,  we  counted  only 
same  body  system  admissions  to  home  health  agencies  different  from  the  original  admitting  agency. 
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body  system  admissions  only  to  home  health  agencies  other  than  the  original  admitting  agency  for 
both  treatment  and  control  agencies. 

Because  our  focus  was  on  changes  over  the  course  of  the  demonstration,  we  wanted  to  maximize 
the  sample  size  in  each  of  the  three  years  of  the  demonstration.  Because  our  claims  file  contained 
data  only  through  December  31,  1998,  we  limited  the  follow-up  period  to  120  days  after  admission 
(the  "at  risk"  period  for  prospectively  paid  agencies).  Most  of  the  agencies  joined  the  demonstration 
in  January  1996.  Had  we  tried  to  examine  8-  and  12-month  follow-up  periods  (as  we  did  in  the 
previous  qualit>'  report),  the  number  of  admissions  we  could  have  analyzed  during  these  agencies' 
third  demonstration  years  would  have  been  extremely  small. 

2.    Claims  Sample 

From  an  initial  file  of  1 74,699  records,  we  excluded  a  number  of  records,  to  arrive  at  the  final 
analysis  file.  To  ensure  a  constant  sample  of  agencies  across  demonstration  years,  we  dropped 
records  fi-om  nine  agencies  that  had  dropped  out  at  any  point  during  the  demonstration.^  We  also 
dropped  records  fi-om  two  agencies  (both  treatments)  that  had  too  few  observations  to  analyze  in  any 
of  the  three  demonstration  years,  leaving  80  agencies.^  We  dropped  3,342  episodes  with  zero  home 


^The  loss  of  these  nine  agencies  is  unlikely  to  have  caused  substantial  attrition  bias  in  our 
results.  Three  of  the  nine  agencies  left  at  the  very  start  of  the  demonstration  and  so  did  not,  in  fact, 
'"drop  out."'  (All  three  were  control  agencies  and  had  also  been  excluded  from  the  claims  analysis 
of  our  previous  quality  report.)  Of  the  remaining  six,  three  of  the  treatment  agencies  and  the  one 
control  agency  closed  or  stopped  accepting  Medicare  patients  for  reasons  unrelated  to  the 
demonstration.  The  other  two  treatment  agencies  were  purchased  by  a  large  chain  whose 
management  no  longer  wished  the  agencies  to  participate.  Thus,  of  the  nine,  only  these  latter  two 
truly  "'dropped  out"  of  the  demonstration  to  lead  to  missing  or  unobserved  quality  outcomes — the 
others  actually  had  no  data  to  contribute  after  the  time  they  ceased  to  function.  To  the  extent  that 
agencies  that  are  attractive  to  large  chains  for  purchase  differ  systematically  in  quality  of  care  from 
other  agencies,  then  the  two  missing  agencies  might  cause  some  bias  in  our  impact  estimates; 
however,  we  doubt  that  this  bias  would  be  extremely  large. 

^One  of  these  two  agencies  had  been  included  in  the  claims  analysis  of  the  previous  quality 
report.  However,  this  agency  only  had  two  episodes  in  the  second  demonstration  year,  and  one 
episode  in  the  third  year.  The  other  agency  had  also  been  excluded  from  analysis  in  the  previous 
report,  as  well. 
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health  visits  and  4,900  with  missing  control  variable  data  (primarily  patient  characteristics  from  the 
UB-92  remarks).  We  excluded  14,893  episodes  of  patients  who  had  been  in  Medicare  health 
maintenance  organizations  (HMOs)  or  for  whom  Medicare  was  a  secondary  payer  on  or  after  the 
episode  start  date,  and  3,248  episodes  occurring  during  the  agency's  phase-out  period  (because  the 
services  provided  to  these  patients  are  not  subject  to  either  prospective  payment  or  traditional  cost 
reimbursement,  and  agencies'  behavior  might  be  largely  independent  of  the  demonstration 
incentives). 

Finally,  to  avoid  a  possible  bias  in  estimating  demonstration  impacts,  we  dropped  19,450 
records  that  represented  a  repeat  demonstration  episode,  to  reach  our  final  claims  analysis  file  of 
123,567  observations.  We  dropped  these  records  primarily  because  (1)  patients  entering  repeat 
episodes  are  likely  to  be  at  different  points  in  their  illnesses  than  patients  entering  their  first  episode, 
and  consequently  their  risks  of  experiencing  outcomes  (such  as  functional  deterioration,  mortality, 
or  hospitalization)  are  likely  to  differ  as  well,  and  (2)  repeat  episodes  are  more  likely  among  patients 
of  treatment  agencies.  Patients  of  prospectively  paid  agencies  are  more  likely  to  be  readmitted  to 
home  health  care  again  simply  because  they  spend  a  greater  amount  of  time  "out  of  home  health 
care  (due  to  their  being  discharged  sooner).  The  increased  likelihood  of  treatment  agency  patients 
to  be  readmitted  to  home  health  care  would  also  distort  the  control  variables,  based  on  prior  home 
health  use  (see  Section  C.l,  below).  To  assess  the  sensitivity  of  the  results  to  excluding  previously 
admitted  patients,  we  reanalyzed  key  outcomes  after  restoring  these  patients  to  the  sample.  Table 
II.2  summarizes  the  construction  of  the  claims  analysis  file. 

B.    QUALITY  ASSURANCE  DATA 

The  demonstration  QA  contractor,  the  Center  for  Health  Policy  Research  (CHPR)  at  the 
University  of  Colorado,  designed  and  implemented  a  patient  outcome-based  quality  monitoring  and 
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continuous  improvement  system  for  the  demonstration.  All  demonstration  agencies  were  required 
to  collect  and  submit  QA  information  to  CHPR.  The  QA  data  collection  instruments  are  scaled- 
down  versions  of  CHPR's  full  Outcome  Assessment  System  Information  Set  (OASIS;  Shaughnessy 
et  al.  1995).  CHPR  calculated  agency-level,  risk-adjusted  profiles  of  patient  outcomes  from  the  QA 
data,  which  were  then  regularly  "fed  back"  to  the  agencies  to  help  them  improve  their  quality  of  care 
(Shaughnessy  et  al.  1995).  Most  demonstration  agencies  had  implemented  QA  data  collection  by 
May  1996. 

A  record  in  the  QA  data  consists  of  a  pair  of  patient  assessments  by  agency  nurses.  The  first 
assessment  occurs  either  at  initial  admission  to  home  health  care  or  at  a  resumption  of  home  health 
care  after  a  hospital  stay  of  48  hours  or  more.  The  nurses  record  this  first  assessment  in  a  QA 
start/resumption  of  care  instrument.  The  second  assessment  takes  place  at  whichever  of  the 
following  occurs  first:  discharge,  120  days  after  admission,  or  the  last  home  health  visit  just  before 
an  inpatient  stay  of  48  hours  or  more.  The  nurses  record  the  second  assessment  on  a  follow- 
up/discharge  instrument. 

1.    Q A  Outcome  Variables 

The  QA  outcome  variables  fall  into  two  broad  categories:  (1)  health  measures,  and  (2) 
emergency  services  use.  Table  II. 3  provides  a  summary  list  of  the  QA  outcome  variables. 

a.    Health  Measures 

We  studied  1 7  health  measures  in  the  QA  data,  including  basic  and  instrumental  Activities  of 
Daily  Living  (ADLs~for  example,  bathing,  grooming,  dressing,  eating,  transferring,  management 
of  oral  medications,  light-meal  preparation),  and  clinical  symptoms  (for  example,  pain  interfering 
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TABLE  II.3 

OUTCOME  VARIABLES  FROM  QUALITY  ASSURANCE  DATA 


Health  Measures—Improvctnent  or  Stabiltzatioa  in: 

Basic  Activities  of  Daily  Living  (BADLs) 
Grooming 
Bathing 
Toileting 
Transferring 
Ambulation 

Instrumental  Activities  of  Daily  Living  (lADLs) 
Light  meal  preparation 
Housekeeping 

Management  of  oral  medications 

Clinical  Symptoms 
Pain 

Pressure  ulcer  count 

Most  problematic  pressure  ulcer 

Surgical  wound  status 

Dyspnea 

Urinary  tract  infection 

Urinary  incontinence  or  catheter  present 

Confusion 

Behavior  problem  frequency 

Emergency  Services  Use 

Reported  Emergency  Visit  to: 
Hospital  emergency  room 
Outpatient  clinic  or  urgent  care  center 
Physician's  office 
Any  of  the  above  
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with  activity,  dyspnea,  confusion).  Items  are  scored  at  each  assessment  on  an  ordinal  severity  scale. 
For  instance,  the  item,  "How  often  does  pain  interfere  with  the  patient's  activity/movement,"  has  a 
four-level  response:  none,  some,  most,  and  all  of  the  time.  Binary  change  variables  of 
"improvement"  or  "stabilization"  in  each  item  are  calculated  from  the  admission  and  follow-up 
scores,  scores.  Improvement  in  a  measure  has  the  value  one  if  a  patient  improves  in  the  scale  for  that 
item  on  followup,  zero  otherwise.  By  definition,  patients  who  start  the  home  health  episode  at  the 
best  level  of  a  measure  are  excluded  from  the  sample  for  improvement  in  that  measure  (they  cannot 
further  improve  in  that  measure).  Stabilization  in  a  measure  has  the  value  one  if  a  patient  does  not 
worsen  on  the  scale  for  that  measure  on  followup  (that  is,  the  patient  remains  the  same  or  improves). 
Patients  who  start  the  episode  at  the  worst  level  of  a  measure  are  similarly  excluded  from 
stabilization  in  that  measure,  since  they  cannot  worsen.  Thus,  the  number  of  patients  "eligible"  for 
improvement  or  stabilization  varied  for  each  measure. 

b.    Reported  Emergency  Health  Services  Use 

Based  on  reports  by  patient,  family,  or  other  providers,  agency  nurses  also  recorded  on  the  QA 
follow-up/discharge  instrument  any  emergency  visits  to  hospital  emergency  rooms,  physician  offices, 
outpatient  clinics,  or  freestanding  urgent  care  centers  since  the  last  completed  instrument.  We  were 
careful  to  also  include  emergency  visits  recorded  on  any  interim  instruments  (that  is,  follow- 
up/discharge  instruments  that  were  completed  prior  to  a  hospital  stay  but  were  not  the  final  discharge 
instrument). 

2.    QA  Sample 

The  initial  QA  file  that  CHPR  provided  to  us  contained  126,876  records,  each  representing  a 
pair  of  initial  assessment  and  follow-up  instruments  for  a  patient  admitted  to  home  health  care.  We 
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removed  the  records  of  4,934  patients  admitted  during  agencies'  phase-out  or  post-demonstration 
periods.  We  also  excluded  5,421  episodes  that  either  started  with  a  return  to  home  heahh  after 
hospitalization  or  ended  with  a  discharge  fi-om  home  health  because  of  hospitahzation.  Such  patients 
are  being  assessed  at  different  points  over  the  course  of  illness  than  those  observed  because  of  first 
admission  to  home  health  care,  discharge  from  home  health  care,  or  reaching  the  end  of  the  at-risk 
period.  We  merged  the  remaining  1 1 6,52 1  QA  records  with  Medicare  claims  data,  dropping  another 
21,278  records  that  did  not  match  exactly  with  a  claims  record  by  the  patient's  HIC  number,  the 
agency's  provider  number,  and  the  start  of  care  date  (thus,  matching  roughly  82  percent  of  the  QA 
records  with  a  claims  record).  CHPR  had  aheady  removed  10  agencies  from  the  file  they  furnished 
us — ^the  9  that  had  dropped  out  of  the  demonstration,  and  1  that  failed  to  transmit  a  sufficient  number 
of  cases  to  CHPR.  We  removed  observations  from  three  other  agencies  that  did  not  have 
observations  in  every  demonstration  year  (again,  to  ensure  a  constant  sample  of  agencies  across  all 
years),  for  a  total  of  1 ,337  records.  We  then  excluded  1 0,293  patients  for  the  following  reasons:  their 
episodes  lasted  one  day  or  less,  their  follow-up  instrument  was  based  on  a  visit  more  than  135  days 
after  admission,  they  were  missing  control  variables,  or  they  were  in  an  HMO  or  had  Medicare  as 
a  secondary  payer  at  some  point  during  the  home  health  episode.  Finally,  we  dropped  15,116  records 
for  which  the  admission  was  a  repeat  demonstration  episode,  leaving  us  with  a  final  QA  analysis  file 
of  69,834  observations  from  76  agencies.''  Table  II.4  summarizes  the  construction  of  the  QA 
analysis  file. 


'*Ten  of  the  1 5  deleted  agencies  were  included  in  the  QA  analysis  in  the  previous  report  on 
quality. 
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C.   CONTROL  AND  SUBGROUP  VARIABLES 

Because  of  the  relatively  small  number  of  agencies  randomized  and  the  differential  attrition  of 
control  and  treatment  agencies,  there  was  a  possibility  of  imbalance  in  important  baseline 
characteristics  between  the  treatment  and  control  groups.  We  thus  added  to  each  data  source  a 
standard  set  of  control  variables  (that  is,  a  set  of  control  variables  to  be  included  in  all  regressions, 
no  matter  the  outcome  or  data  source),  to  adjust  for  any  such  imbalances.  The  control  variables, 
listed  in  Table  II. 5,  measure  characteristics  at  three  levels:  patient,  agency,  and  area. 

I.    Standard  Control  Variables  for  Patient  Characteristics 

Individuals  who,  on  admission  to  home  health  care,  are  more  severely  ill  or  more  functionally 
impaired,  or  who  carry  certain  diagnoses  have  a  higher  likelihood  (unrelated  to  the  quality  of  home 
health  care)  of  suffering  adverse  outcomes  such  as  admission  to  hospital,  functional  decline,  or 
death.  We  drew  the  standard  set  of  control  variables  on  patient  characteristics  from  three  data 
sources:  (1)  the  "Remarks"  fields  from  the  home  health  lJB-92  bills,  (2)  the  Medicare  EDB,  and  (3) 
the  Medicare  SAF.  In  the  "Remarks"  field  for  the  first  UB-92  bill  following  a  demonstration 
admission,  all  agencies  were  required  to  submit  information  on  patient  characteristics  needed  for  the 
demonstration  case-mix  adjuster.  The  characteristics  included  measures  of  impairment  in  ADLs, 
along  with  whether  the  patient  had  certain  medical  conditions  (cancer,  diabetes,  decubitus  ulcers), 
and  care  needs  (complex  wound  care). 

Medicare  enrollment  data  provided  us  with  basic  patient  demographic  information,  including 
the  patient's  age  (at  the  start  of  the  home  health  episode),  gender,  race,  and  the  original  reason  for 
Medicare  qualification  (for  example,  age  or  disability).  From  the  SAF,  we  constructed  measures  of 
prior  Medicare  service  use  to  reflect  the  patient's  severity  of  illness,  including  measures  of  both 
recent  acute  illness  (whether  admitted  from  hospital,  length  of  prior  hospital  stay)  and  longer-term 
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illness  (home  health  care  and  hospital  use  in  the  six  months  prior  to  home  health  admission).  For 
beneficiaries  less  than  65.5  years  old  at  home  health  admission,  we  used  the  mean  value  for 
beneficiaries  between  65.5  and  66  years  old  as  a  proxy  measure,  since  beneficiaries  under  age  65.5 
would  not  have  been  eligible  for  Medicare  service  for  a  fiill  six  months. 

Because  of  treatment-control  differences  in  the  times  when  agencies  entered  the  demonstration, 
we  included  variables  for  both  the  calendar  time  period  and  the  demonstration  year  for  each  agency. 
There  were  indicator  variables  for  the  quarter  of  each  calendar  year  that  patients  were  admitted  to 
home  health  care,  as  well  as  indicator  variables  for  whether  the  patient's  admitting  home  health 
agency  was  in  its  first,  second,  or  third  year  in  the  demonstration. 

2,    Standard  Control  Variables  for  Agency  Characteristics 

The  characteristics  of  home  health  agencies  may  influence  the  mix  of  home  health  services 
delivered  or  the  types  of  patients  served.  Hospital-based  agencies,  for  example,  might  serve  more 
acutely  ill  patients  than  freestanding  agencies.  To  control  for  any  treatment-control  differences  in 
agency  characterisfics,  we  used  data  fi-om  base-year  Medicare  cost  reports.  These  reports  provided 
information  on  agencies'  base-year  characteristics,  including  for-profit  status,  affiliation,  and  size 
(as  measured  by  total  number  of  visits  rendered).  The  demonstration  implementation  contractor 
gathered  information  on  agencies'  chain  membership  and  rural  location  during  demonstration 
recruitment.  For  each  agency,  we  calculated  a  predemonstration  practice  pattern  index  as  the  case 
mix  adjusted  ratio  of  the  average  number  of  visits  the  agency  made  to  its  patients  in  the  120  days 
after  admission  during  its  predemonstration  base  quarter  to  the  average  number  made  by  other 
demonstration  agencies.^  A  pracfice-pattem  index  value  greater  than  one  indicates  that,  controlling 
for  differences  in  case  mix  during  the  quarter  preceding  the  demonstration,  an  agency  provided  more 

^See  Trenholm  (2000a)  for  details  on  the  calculation  of  this  pracfice  pattern  index. 
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visits  in  the  120-day  period  than  did  other  demonstration  agencies.  The  agency  practice  pattern 
variable  may  affect  patient  outcomes  either  directly  as  a  measure  of  home  health  care  intensity  or 
indirectly  as  a  reflection  of  agency  and  area  characteristics  prior  to  the  demonstration  (for  example, 
agency  ownership,  local  market  characteristics). 

3.  Standard  Control  Variables  for  Area  Characteristics 

We  also  controlled  for  area-level  characteristics  that  might  influence  the  outcomes  under  study. 
For  example,  likelihood  of  admission  to  a  skilled  nursing  facility  or  to  home  health  care  may  depend 
on  the  local  supply  of  nursing  home  beds.  From  the  1996  Area  Resource  File  of  the  Health 
Resources  and  Services  Administration  (HRSA),  which  reports  data  from  previous  years,  we 
obtained  or  constructed  the  number  of  physicians  per  1 0,000  residents,  the  number  of  nursing  home 
beds  per  100  elderly  residents,  and  hospital  occupancy  rates. 

4.  Control  Variables  Specific  to  the  Analyses  of  the  Medicare  Claims  File  or  QA  Files 

In  addition  to  the  standard  set  of  control  variables  described  above,  we  included  a  few  control 
variables  specific  to  each  data  source  in  the  analysis  of  that  data  source.  Because  the  claims 
outcomes  focused  primarily  on  use  of  health  services,  we  included  two  additional  control  variables 
measuring  prior  use:  (1)  the  total  number  of  hospitalizations,  and  (2),  the  total  number  of  SNF 
admissions,  both  in  the  six  months  prior  to  the  index  home  health  admission. 

For  the  analyses  of  the  QA  outcomes,  the  QA  data  contained  a  rich  set  of  patient  variables  from 
which  the  standard  set  of  control  variables  could  potentially  be  augmented,  but  the  statistical 
problems  of  "overfitting"  and  "complete  separation"  limited  the  number  of  additional  control 
variables  we  could  include  (Hosmer  and  Lemeshow  1989).  Overfitting  occurs  when  the  data  are 
spread  out  over  too  many  cross-classifications  of  outcomes  and  independent  variables,  leading  to 
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data  cells  with  zero  observations.  Complete  separation  occurs  when  there  is  no  overlap  in  the 
distribution  of  the  independent  variables  between  the  two  outcome  groups,  so  that,  in  other  words, 
a  pattern  of  the  independent  variables  discriminates  perfectly  between  observations  with  and  without 
the  outcome.  Both  problems  manifest  themselves  through  implausibly  large  estimated  coefficient 
values  and/or  standard  errors.  Both  problems  are  especially  likely  when  sample  sizes  are  small, 
either  outcomes  or  nonoutcomes  are  rare,  and  there  are  large  numbers  of  regressor  variables, 
especially  ones  with  low  incidences  in  the  sample.  We  thus  faced  the  overfitting  and  separation 
problems  for  such  outcomes  as  improvement  in  most  problematic  pressure  ulcer,  and  improvement 
and  stabilization  in  surgical  wound  status,  where  sample  sizes  were  modest  (a  few  thousand)  and 
failure  to  improve  or  stabilize  was  rare.  The  smaller  sample  sizes  for  the  subgroup  analyses  only 
exacerbated  these  situations. 

We  followed  the  same  strategy  we  used  in  our  previous  quality  report  to  deal  with  these 
problems—including  as  new  control  variables  only  those  expected  to  be  correlated  v^th  improvement 
or  stabilization  in  function,  symptom  severity,  and  emergency  care  use;  and  we  dropped  from  the 
model  any  of  the  standard  control  variables  with  a  less  than  five  percent  incidence  in  the  sample.^ 
We  thus  added  a  number  of  admission  QA  assessment  variables,  listed  in  Table  II. 6,  to  regressions 
as  control  variables.  The  following  standard  control  variables  had  a  less  than  five  percent  incidence 
in  the  QA  sample,  and  were  thus  dropped  from  the  model:  coverage  by  Medicare  for  less  than  six 
months,  membership  in  an  HMO  in  the  six  months  prior  to  home  health  admission,  having  Medicare 
as  a  secondary  payer  in  the  six  months  prior  to  home  health  admission,  and  admission  by  a  nonurban 
agency. 


^We  did  not  use  these  additional  QA  control  variables  for  the  analyses  of  the  claims  data  because 
of  the  nonoveriapping  time  periods  of  the  QA  and  claims  data. 
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TABLE  11.6 

ADDITIONAL  CONTROL  VARIABLES  FROM  QA  DATA 


Symptoms  and  Conditions  at  Admission 

High-Risk  Factors^ 

Medically  Unstable 

Expresses  Feelings  of  Depression 

Observed  to  Be  Depressed 

Demonstrates  Cognitive  Impairment 

Prognosis  at  Admission 

Likelihood  That  Treatment  Can  Be  Taken  Over"' 
Prognosis  Is  Good/Fair 
Life  Expectancy  Less  than  Six  Months 
Rehabilitative  Prognosis  Is  Good 

Availability  of  Informal  Care  at  Admission 

Live-In  Informal  Help 

Paid  Help  or  Residing  in  Assisted-Living  Residence  

^Has  any  of  the  following  risk  factors  for  poor  health  outcomes:  heavy  smoking,  obesity,  alcoholism, 
or  drug  dependency. 

''By  patient,  or  by  relatives,  friends,  neighbors,  or  paid  helpers  of  patient. 
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D.   SUMMARY  STATISTICS  FOR  CONTROL  VARIABLES 

We  present  the  means  of  control  variables  in  Tables  II.7  and  II. 8,  to  allow  the  reader  to  compare 
the  two  data  sources  and  the  treatment  and  control  groups  and  to  gain  an  overview  of  patients  and 
agencies  in  the  demonstration.  In  the  tables,  observations  are  weighted  to  give  each  agency  equal 
representation  within  each  year.^ 

1.    Patient  Characteristics 

Demonstration  patients  tended  to  be  elderly,  female,  and  both  chronically  and  acutely  ill. 
Slightly  fewer  than  two-thirds  were  age  75  or  older,  and  the  same  proportion  were  women.  One- 
quarter  were  poor  enough  to  qualify  for  Medicaid  pajmient  of  their  Medicare  premiums.  Most 
patients  suffered  multiple  limitations  in  ADLs  and  from  chronic  illnesses  (diabetes  in  about  20 
percent,  a  history  of  stroke  in  about  14  percent,  and  a  history  of  cancer  in  13  percent).  In  the  QA 
data,  we  found  a  high  prevalence  of  baseline  affective  and  cognitive  symptoms  (nearly  one-quarter 
of  patients  having  depressed  feelings,  depressive  behaviors,  or  disruptive  behaviors).  A  large 
percentage  were  also  recovering  from  acute  illness,  with  roughly  40  percent  coming  to  home  health 
care  from  a  hospital  stay,  and  roughly  16  percent  having  been  in  a  SNF  during  the  two  weeks  before 
home  care.  Roughly  half  of  the  patients  were  judged  by  agency  staff  to  be  medically  unstable. 


^The  significance  levels  for  the  tests  of  equality  between  treatment  and  control  group  means  in 
Tables  II. 7  and  II. 8  do  not  account  for  design  effects  due  to  the  clustering  of  patients  within 
agencies.  Since  we  are  describing  only  our  analysis  samples,  without  generalizing  to  the  universe 
of  all  agencies,  there  is  no  need  to  account  for  clustering  effects.  We  do  account,  however,  for  the 
design  effects  associated  with  our  use  of  sample  weights.  In  our  impact  analyses,  in  contrast,  we 
account  for  design  effects  from  both  the  use  of  weights  and  the  clustering  of  patients  within  agencies. 
See  Chapter  III,  Sections  B  and  C,  for  a  discussion  of  the  effects  of  clustering  and  weighting  in  our 
impact  analysis. 
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Patients  of  treatment  and  control  agencies  were  similar.  Differences  in  baseline  prognostic 
factors,  both  within  each  year  and  over  three  years,  appeared  to  be  random,  with  some  differences 
favoring  control  agencies  and  others  treatment  agencies.  Because  of  the  large  sizes  of  the  QA  and 
claims  data  sets,  even  some  of  these  small  differences  reached  statistical  significance.  In  the  overall 
(aggregated  over  all  three  years)  comparison  to  treatment  agency  patients,  control  agency  patients 
were  younger  but  also  slightly  more  acutely  ill  (coming  to  home  health  care  directly  after  hospital 
discharge)  and  had  a  higher  prevalence  of  dressing  impairment,  risk  factors,  medical  instability, 
depression,  and  disruptive  behaviors,  and  somewhat  lower  levels  of  good  prognosis.  Both  the  QA 
and  claims  data  reflected  these  differences. 

Similarly,  the  time  trends  in  patient  characteristics  within  each  group  over  the  three  years  were 
reflected  in  both  data  sets  and  appeared  random.  Over  time,  patients  of  treatment  agencies  became 
somewhat  younger  and  less  likely  to  have  cancer,  but  they  also  became  more  likely  to  require 
complicated  wound  care  and  to  have  come  from  an  acute  hospital  stay.  Agencies  in  both  groups  had 
progressively  fewer  patients  with  eating  and  dressing  limitations  over  time,  although  these  changes 
seemed  slightly  more  pronounced  among  patients  of  control  agencies  than  among  patients  of 
treatment  agencies. 

2.    Agency  and  Area  Characteristics 

There  were  several  important  differences  between  the  agencies  assigned  to  the  two  groups.  For 
example,  there  were  substantially  more  hospital-based  agencies  among  cost-reimbursed  agencies 
than  prospectively  paid  agencies  (explaining,  perhaps,  the  greater  proportion  of  patients  among 
control  agencies  who  entered  home  health  care  from  the  hospital).  The  treatment  agencies,  on  the 
other  hand,  had  many  more  small  agencies  and  agencies  that  belonged  to  a  chain.  Treatment 
agencies  also  had  significantly  lower  base-year  practice  pattern  indices.  The  differences  between 
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the  treatment  and  control  means  for  agency  characteristics,  in  the  claims  data  versus  the  QA  data, 
are  quite  small. 

3.    Implications  for  the  Analysis  of  Program  Impacts 

It  is  reassuring  to  find  the  general  similarity  of  patients  and  agencies  across  the  two  different 
data  sources.  The  findings  from  any  one  of  the  data  sources  are  thus  likely  to  reflect  the 
demonstration  as  a  whole,  and  not  stem  fi-om  peculiarities  of  that  particular  data  set. 

Although  the  preexisting  treatment-control  differences  in  patient  characteristics  are  relatively 
small,  there  are  several  larger  differences  in  agency  and  area  characteristics.  For  example,  the 
treatment  agencies'  lower  mean  practice  pattern  index  already  indicates  a  baseline  difference  in 
intensity  of  service  provision.  In  an  unadjusted  comparison,  such  a  difference  could  lead  to  a 
spurious  treatment-control  impact  on  quality-of-care  outcomes.* 

E.   STATISTICAL  METHODS 

The  differences  in  baseline  variables  at  all  three  levels  (patient,  agency,  and  area),  despite 
random  assignment  of  agencies,  underscore  the  importance  of  estimating  program  impacts  with 
regression  models  that  control  for  such  confounding  factors.  Regression  models  also  improve 
statistical  precision.  We  used  logistic  regression  because  all  the  dependent  variables  in  this  report 
are  binary.  We  weighted  each  observation  so  as  to  give  each  agency  equal  representation  in  the 
analysis.^ 


^See  Foster  (2000)  for  a  discussion  of  agency  characteristics  and  how  they  affect  the 
generalizability  of  our  findings. 

^In  the  previous  report,  we  dealt  with  excessively  large  weights  by  trimming  them  at  the  85th 
percenfile  of  the  distribution  (see  Chen  2000  for  details). 
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We  also  conducted  several  sensitivity  analyses  to  examine  the  robustness  of  our  impact 
estimates  to  such  factors  as  data  censoring,  outliers,  and  the  use  of  sample  weights.  All  estimates 
from  regression  models  have  standard  errors  that  take  into  account  the  effects  of  sample  clustering 
and  weighting. 

Throughout  our  tables  of  results  in  Chapter  III,  we  present  the  regression-adjusted  means  of  the 
control  group  as  points  of  reference  alongside  the  estimated  treatment-control  differences.  The 
adjusted  control  group  means  allow  us  to  judge  the  relative  magnitudes  of  the  estimated  treatment- 
control  differences.'" 

1.    Estimating  Impacts  Within  Demonstration  Years 

To  examine  demonstration  effects  over  the  course  of  the  demonstration,  we  estimate  a 
regression  model  that  includes  dummy  variables  for  the  demonstration  year  in  which  the  admission 
took  place,  as  well  as  interaction  terms  between  the  agency's  treatment  status  and  the  demonstration 
year.  The  full  model  is  thus  given  by: 

(1)  Y=a+Xp  +  dT  +  X,Year2  +  X,Year3  +  yj(T*Year2)  +  y^fT^YearS), 
where: 


'°In  previous  reports  we  presented  the  unadjusted  control  group  means  as  reference  points  (Chen 
2000,  and  Chen  and  Noveck  199x).  The  previous  reports,  however,  focused  on  comparisons  of  the 
treatment  and  control  group  data  pooled  over  time,  whereas  this  report  focuses  on  treatment  and 
control  group  comparisons  within  each  demonstration  year.  We  found  the  multiple  unadjusted 
means  of  the  control  group  within  each  year  to  exhibit  a  misleading  variability  (even  more  so  for  the 
subgroup  analyses),  so  we  decided  to  use  regression-adjusted  means  instead.  The  control  group 
regression-adjusted  means  are  calculated  using  the  entire  analysis  sample  (both  treatment  and  control 
observations).  For  each  observation  the  treatment/control  indicator  variable  is  set  to  the  control 
group  value  of  zero,  the  indicator  variables  for  demonstration  year  (and  subgroup  for  the  subgroup 
analyses)  are  set  to  the  appropriate  values,  and  the  expected  value  for  that  observation  calculated. 
From  these,  the  overall  control  group  mean  expected  value  is  then  calculated. 
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Y  =  the  log  of  the  odds  ratio  of  the  occurrence  of  the  binary  outcome  variable 
measured  after  a  patient's  admission  to  a  demonstration  home  health  agency 

Year  2  =  a  variable  that  equals  one  if  the  admission  took  place  in  Year2,  and  zero 
otherwise 

Year  5  =  a  variable  that  equals  one  if  the  admission  took  place  in  YearS,  and  zero 
otherwise 

X  =        the  set  of  control  variables 

T  =  a  binary  variable  for  treatment  status  that  equals  one  for  admissions  to  treatment 
agencies,  and  a  zero  for  admissions  to  control  agencies 

2;  =        the  coefficient  on  the  variable  Year 2 

^2  =        the  coefficient  on  the  variable  YearS 

y,  =  the  coefficient  on  the  interaction  term  between  the  variables  Yearl  and  T.  This 
interaction  term  equals  one  for  admissions  in  demonstration  Year2  to  treatment 
agencies,  and  zero  otherwise 

72  -  the  coefficient  on  the  interaction  term  between  the  variables  Year3  and  T.  This 
interaction  term  equals  one  for  admissions  in  demonstration  YearS  to  treatment 
agencies,  and  zero  otherwise 

a  =        the  intercept  term 

P  =        the  vector  of  regression  coefficients  on  the  control  variables 
S  =        the  regression  coefficient  on  the  variable  for  treatment  status 

The  estimated  effect  of  prospective  payment  in  the  first  demonstration  year  is  given  by  5,  and 
the  estimated  effects  in  the  second  and  third  demonstration  years  are  5  +  7/  and  5  +  7^,  respectively; 
7/  measures  the  change  in  the  effect  of  prospective  payment  from  the  first  to  the  second  year  of  the 
demonstration,  and  -  7/,  measures  the  change  in  demonstration  effect  from  the  second  to  the  third 
year  of  the  demonstration. 
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2.    Estimating  Subgroup  Impacts 

Since  we  expect  impacts  on  quality  of  care  to  occur  most  likely  through  reductions  in  service, 
we  focused  our  subgroup  analyses  on  those  subgroups  in  which  prospective  payment  had  the  largest 
impacts— namely,  "high-use"  versus  "low-use  agencies."  High-use  agencies  are  those  with 
predemonstration  practice  pattern  indices  above  the  median  value  for  all  agencies;  conversely,  low- 
use  agencies  have  indices  below  the  median.  First,  Trenholm  (2000a)  found  that,  in  the  early  part 
of  the  demonstration,  prospective  payment  induced  "high-use"  agencies  to  make  larger  reductions 
in  service  use  than  low-use  agencies.'"  Archibald  and  Cheh  (2000)  then  found  that,  over  the  course 
of  the  demonstration,  the  impact  of  prospective  payment  on  the  low-use  agencies  "caught-up"  to  the 
impact  on  high-use  agencies  because  of  progressively  larger  cuts  in  visits  by  the  prospectively  paid 
low-use  agencies.  As  in  the  previous  quality  report,  we  restricted  our  subgroup  analyses  to  a  subset 
of  key  outcomes  selected  from  the  main  groups  of  outcomes,  in  order  to  maintain  a  manageable 
number  of  additional  regressions.  The  subgroup  analyses  thus  examined  the  following  outcomes: 
BADLs  (grooming,  bathing,  toileting,  transferring,  and  ambulation),  lADLs  (management  of 
medications),  clinical  symptoms  (pain,  most  problematic  pressure  ulcer,  surgical  wound  status, 
dyspnea,  urinary  incontinence  or  catheter  present,  and  confusion),  and  admission  to  a  Medicare  Part 
A  service  for  a  same  body  system  diagnosis  (hospital  or  home  health  care). 


"We  also  performed  subgroup  analyses  on  two  additional  agency  subgroups:  for-profit/not-for- 
proflt,  and  small/large.  These  analyses  found  no  significant  effects  and  are  not  presented  in  this 
report. 

'^Two  other  apparent  subgroup  differences,  with  the  effects  of  prospective  payment  larger 
among  for-profit  agencies  and  small  agencies  than  their  counterparts  (not  for-profit  and  large 
agencies,  respectively),  were  explained  largely  by  the  greater  likelihood  that  these  agency  types  also 
had  high-use  practice  patterns. 
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Our  model  for  examining  impacts  within  agencies  with  high-  and  low-use  practice  patterns  is 
as  follows: 

(2)  Y=a+Xp  +  ST  +  X,Year2  +  X^YearS  +  (pH  +  y,(T*Year2)  +  y^fT'^YearS)  + 
rj(H*T)  +  T;  (H*Year2)  +  t,  (H*Year3)  +  ^,  (H*T*Year2)  +  (H*T*Year3), 

where: 

Y  ,a,  X,  P,  8,  T,  Ij,  Year 2,  /l^.  Year 3,  y,,  y^,  and  s  are  as  defined  above  for  equation  (1),  and: 

H  =  a  binary  variable  for  agency  practice  pattern  that  equals  one  for  admissions  to  high- 
use  agencies,  and  zero  for  admissions  to  low-use  agencies 

(p  =    the  coefficient  on  the  variable  H 

rj  =  a  coefficient  on  the  interaction  term  between  the  variables  H  and  T\  this  interaction 
term  equals  one  for  admissions  to  treatment  agencies  with  high-use  practice 
patterns,  and  zero  otherwise 

T;  =  a  coefficient  on  the  interaction  term  between  the  variables  H  and  Year2;  this 
interaction  term  equals  one  for  admissions  to  agencies  with  high-use  practice 
patterns  in  demonstration  Year2;  and  zero  otherwise 

=  a  coefficient  on  the  interaction  term  between  the  variables  H  and  Year3;  this 
interaction  term  equals  one  for  admissions  to  agencies  with  high-use  practice 
patterns  in  demonstration  Year3,  and  zero  otherwise 

^/  =  a  coefficient  on  the  interaction  term  between  the  variables  H,  Year 2,  and  T;  this 
interaction  term  equals  one  for  admissions  to  treatment  agencies  with  high-use 
practice  patterns  in  demonstration  Year2,  and  zero  otherwise 

^2  ^  a  coefficient  on  the  interaction  term  between  the  variables  H,  Year3,  and  T;  this 
interaction  term  equals  one  for  admissions  to  treatment  agencies  with  high-use 
practice  patterns  in  demonstration  Year3,  and  zero  otherwise 

Since  our  main  interest  lies  in  whether  prospective  payment  affected  different  types  of  agencies 
differentially,  we  are  interested  in  statistical  tests  for  treatment  effects  within  each  subgroup,  within 
each  year.  The  estimated  effect  of  prospective  payment  in  high-use  agencies  in  Year  1  is  S  +  rj,  and 
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the  estimated  difference  in  the  effect  of  prospective  payment  between  high-use  and  low-use 
subgroups  is  rj.  In  Year  2,  the  estimated  effect  of  prospective  payment  in  high-use  agencies  isS  + 
rj  +  y/  +  ^j,  and  the  estimated  difference  in  the  effect  of  prospective  payment  between  high-use  and 
low-use  subgroups  is  rj  +  Finally,  in  Year3,  the  estimated  effect  of  prospective  payment  in  high- 
use  agencies  is  <5  +//  +  72  +  ^2^  the  estimated  difference  in  the  effect  of  prospective  payment 
between  high-use  and  low-use  subgroups  is  ^  +  ^2- 

3.    Hypothesis  Testing 

For  each  outcome,  a  two-tailed  r-statistic  tests  the  null  hypothesis  that  specific  coefficients  or 
sums  of  coefficients,  corresponding  to  specified  effects  of  prospective  payment,  are  zero.'^  The 
associated  p-value  indicates  the  probability  of  obtaining  a  sample  estimate  of  the  observed 
magnitude  if  the  null  hypothesis  was  true.  The  /i-value  is  based  on  estimated  standard  errors  that 
account  for  the  clustering  of  patients  within  agencies  and  the  use  of  sample  weights.  A  /?- value  of 
less  than  .10  indicates  rejection  of  the  null  hypothesis  and  provides  significant  statistical  evidence 
that  a  demonstration  impact  probably  exists.  At  this  /?-value,  however,  approximately  10  percent  of 
independent  tests  will  show,  simply  by  chance,  a  statistically  significant  treatment-control  difference 
when  there  is  no  true  program  effect  (known  as  Type  I  error).  Therefore,  in  assessing  whether  a 
statistically  significant  treatment-control  difference  (especially  one  with  a  /7-value  between  .05  and 
.10)  should  be  interpreted  as  a  true  program  impact,  we  consider  whether  the  sign  and  magnitude  of 
the  predicted  effect  are  consistent  with  those  for  related  outcomes. 


'^We  use  two-tailed  tests  throughout  our  analysis  to  avoid  confusion  and  to  flag  estimates  of  the 
"wrong"  expected  sign  that  are  large  enough  to  be  statistically  significant.  For  impacts  with  the 
"correct"  expected  sign,  a  two-tailed  test  is  less  likely  than  a  one-tailed  test  to  reject  the  hypothesis 
of  no  demonstration  effect  (all  else  being  equal). 
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III.  RESULTS 


The  results  of  this  report  show  that  prospective  payment  caused  no  major  adverse  effects  on 
patient  outcomes  from  prospectively  paid  agencies'  progressive  reductions  in  service.  Whether  the 
demonstration  was  entirely  devoid  of  any  impacts  on  the  quality  of  care  is  harder  to  determine. 
Before  presenting  the  results  of  our  analyses,  we  briefly  review  the  impacts  of  prospective  payment 
on  service  use  over  the  course  of  the  demonstration;  then  we  discuss  how  the  observed  changes  in 
service  use  might  affect  the  quality  of  care.  Next,  we  report  the  results  of  the  main  analyses,  assess 
their  sensitivity  to  different  analytic  specifications,  and  discuss  their  interpretation.  Finally,  we 
present  and  interpret  the  results  of  the  subgroup  analyses. 

A.   PREVIOUS  FINDINGS  ON  SERVICE  USE  AND  QUALITY  OF  CARE 

Based  on  analysis  of  data  from  the  first  two  years  of  the  demonstration,  Trenholm  (2000a)  found 
that  prospective  payment  generated  large  reductions  in  service  use  (a  24  percent  reduction  in  the 
average  number  of  visits  in  the  year  following  admission  to  patients  of  treatment  agencies,  compared 
to  those  of  control  agencies).  Treatment  agencies  achieved  these  reductions,  both  by  shortening  the 
episode  lengths  and  by  lowering  the  frequency  of  visits  provided.  Reductions,  which  occurred  in  "at- 
risk"  as  well  as  "outlier"  periods,  were  of  a  similar  proportion  for  all  types  of  visits. 

Reports  on  quality  of  care  (Chen  2000),  Medicare  service  use  (Schore  2000),  and  non-Medicare 
service  use  (Phillips  2000)  found  that,  at  least  in  the  first  two  years  of  the  demonstration,  agencies 
were  able  to  make  these  substantial  cuts  in  service  provision  without  adversely  affecting  patient 
outcomes  or  increasing  the  use  of  other  Medicare  or  non-Medicare  services.  Specifically,  the 
previous  quality  report  (Chen  2000)  detected  no  adverse  impacts  of  prospective  payment  on 


37 


numerous  measures  of  patient  health,  function,  and  health  services  use.'  Nor  was  there  any  evidence 
for  differential  effects  of  prospective  payment  among  any  agency  or  patient  subgroups. 

Importantly  for  this  report,  Archibald  and  Cheh's  (2000)  follow-up  report  on  service  use,  using 
data  from  all  three  years  of  the  demonstration,  found  that  both  treatment  and  control  agencies 
continued  to  reduce  both  the  number  of  visits  they  provided  and  the  length  of  patient  episodes  in  the 
third  demonstration  year.  Reductions  over  time  by  both  groups  roughly  paralleled  each  other,  so  that 
demonstration  impacts  remained  large  and  stable  throughout  all  three  years  of  the  demonstration. 

B.   POTENTIAL  EFFECTS  OF  CONTINUED  REDUCTIONS  IN  SERVICE  USE  IN 
YEAR  3  ON  QUALITY  OF  CARE 

Although  we  previously  reported  that  the  service  reductions  in  the  first  two  years  had  no 
discernible  effects  on  quality  of  care,  it  is  unclear  whether  the  additional  cuts  in  service  in  Year  3 
would  continue  to  have  no  impacts  on  quality  of  care.  As  discussed  in  Chen  (2000),  an  agency's 
patient  outcomes  likely  depend  not  only  on  the  volume  of  visits  provided,  but  also  on  the  content 
and  quality  of  the  visits,  and  the  baseline  level  of  service  provision.  That  is,  while  an  agency 
starting  from  a  high  (or,  perhaps,  even  excessive)  level  of  service  provision  may  have  plenty  of 
"room"  for  service  reductions,  one  already  at  a  low  level  of  service  may  not.  It  is  conceivable  that, 
assuming  constant  content  or  quality  of  visits,  treatment  agencies  might  approach  a  threshold  or 
minimum  level  of  service  as  they  continued  to  reduce  service  provision  over  the  course  of  the 
demonstration.  On  the  other  hand,  agencies  may  find  ways  to  improve  the  quality  of  care  in  such 
a  way  as  to  maintain  patient  well-being  despite  progressively  fewer  visits  (for  example,  through 


'There  was  weak  evidence  that  patients  of  prospectively  paid  agencies  experienced  slightly 
fewer  emergency  visits  to  clinics  and  doctors'  offices,  and  hospitalizations  for  same  body  system 
diagnoses.  Patients  of  treatment  agencies  were  also  slightly  more  dissatisfied  with  the  interpersonal 
care  from  agency  staff  than  were  patients  of  control  agencies. 
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better  patient  education  materials  or  through  cUnical  pathways  to  reduce  unnecessary  variations  in 
nurses'  practices). 

C.   DEMONSTRATION  EFFECTS  ON  PATIENT  FUNCTIONING 

For  the  first  two  years  of  the  demonstration,  there  were  essentially  no  differences  between 
patients  of  treatment  and  control  agencies  in  improvement  and  stabilization  in  functioning,  but  in 
demonstration  Year  3,  patients  of  treatment  agencies  appeared  somewhat  less  likely  to  stabilize  in 
several  areas  of  functioning. 

1.    Basic  Activities  of  Daily  Living 

Table  III.l  shows  that,  in  the  first  demonstration  year,  there  were  no  significant  differences  in 
any  of  the  five  BADLs  between  treatment  and  control  agency  patients.  In  demonstration  Year  2, 
there  were  two  small,  positive,  marginally  statistically  significant  differences  between  the  treatment 
and  control  group  in  improvement  in  grooming  and  improvement  in  bathing  (that  is,  treatment 
agency  patients  were  more  likely  to  improve  in  these  BADLs  than  were  control  agency  patients). 
At  roughly  six  percent  of  the  corresponding  control  group  means,  both  of  these  positive  estimated 
differences  were  small. 

In  demonstration  Year  3,  however,  a  few  negative  treatment-control  differences  appear  for  the 
first  time,  but  only  among  the  stabilization  outcomes.  Specifically,  patients  of  prospectively  paid 
agencies  appear  less  likely  than  patients  of  cost-reimbursed  agencies  to  stabilize  (that  is,  more  likely 
to  worsen)  in  grooming,  bathing,  and  toileting.  Relatively  speaking,  however,  these  effects  are 
small,  roughly  only  two  to  three  percent  of  the  control  group  mean. 
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2.    Instrumental  Activities  of  Daily  Living 

The  lADLs  show  a  similar  pattern  of  results  to  those  for  the  BADLs  (Table  III.2).  In  the  first 
demonstration  year,  there  appear  to  be  negligible  effects  on  lADLs.  In  the  second  demonstration 
year,  we  see  two  marginally  statistically  significant  treatment-control  differences  in  opposing 
directions.  In  the  first,  treatment  agency  patients  are  roughly  three  percentage  points  less  likely  to 
improve  in  light  meal  preparation  than  control  agency  patients  (a  difference  representing  about  six 
percent  of  the  control  group  mean).  In  the  second,  treatment  agency  patients  are  roughly  three 
percentage  points  more  likely  to  improve  in  housekeeping  than  control  agency  patients  (a  difference 
representing  about  eight  percent  of  the  control  group  mean).  Finally,  in  the  third  demonstration  year, 
treatment  agency  patients  have  statistically  significantly  lower  rates  of  stabilization  (higher  rates  of 
worsening)  than  control  agency  patients  for  three  areas:  (1)  light  meal  preparation  (a  difference  of 
about  two  percent  relative  to  the  control  group  mean),  (2)  housekeeping  (a  difference  of  about  six 
percent  of  the  control  group  mean),  and  (3)  management  of  medications  (nearly  three  percent  of  the 
control  group  mean).  In  addition,  treatment  agency  patients  have  a  3.5  percentage  point  lower  rate 
of  improvement  in  management  of  medications,  or  a  difference  of  slightly  more  than  9  percent  of 
the  control  group  mean. 

D.   DEMONSTRATION  EFFECTS  ON  PATIENT  MORTALITY  AND  CLINICAL 
SYMPTOMS 

The  results  for  clinical  symptoms  are  generally  favorable  to  treatment  agencies,  in  contrast  to 
the  somewhat  unfavorable  basic  and  instrumental  ADL  results  discussed  above.  Although  mortality 
at  120  days  showed  no  significant  treatment-control  differences,  treatment  agency  patients  were 
significantly  more  likely  than  control  agency  patients  to  improve  in  dyspnea,  urinary  incontinence 
or  catheter,  and  behavior  problem  frequency  in  two  consecutive  demonstration  years,  and  in  pain  and 
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confusion  in  all  three  demonstration  years  (Table  III. 3).  Furthermore,  these  treatment-control 
differences  favoring  the  treatment  agencies  are  substantial  in  magnitude.  Expressed  as  percentages 
of  the  control  group  mean,  the  annual  differences  range  around  9  percent  for  improvement  in 
dyspnea,  13  to  15  percent  for  urinary  incontinence,  1 1  to  16  percent  for  behavior  problem  frequency, 
8  to  12  percent  for  pain,  and  11  to  18  percent  for  confusion.  Additional,  statistically  significant 
treatment-control  differences  included  a  slightly  better  rate  for  treatment  agencies  in  stabilization 
in  urinary  incontinence  in  demonstration  Year  1  (by  only  about  one  percentage  point),  and  a  slightly 
worse  rate  for  treatment  agencies  in  stabilization  of  most  problematic  pressure  ulcer  (also  by  one 
percentage  point). 

E.   DEMONSTRATION  EFFECTS  ON  PATIENTS'  USE  OF  SELECTED  HEALTH 
SERVICES 

Prospective  payment  had  no  lasting  effect  on  use  of  emergency  care  services.  In  demonstration 
Year  1 ,  rates  for  emergency  room  visits  and  emergency  visits  to  physician  offices  for  treatment 
agency  patients  were  slightly  lower  than  for  control  agency  patients  (by  about  one  percentage  point), 
leading  to  a  significantly  lower  rate  for  the  combined  outcomes  of  any  emergency  visits  (Table  III.4). 
This  effect  had  dissipated  by  years  two  and  three,  however,  suggesting  a  transient  effect  only,  or 
perhaps  no  true  effect  at  all. 

An  increased  likelihood  of  home  health  agency  admission  for  same  body  system  diagnoses 
among  treatment  agency  patients  appears  in  demonstration  Year  3  (Table  III.4).  However,  this 
treatment-control  difference  of  about  two  and  a  half  percentage  points  in  Year  3  appears  to  be  due 
in  part  to  a  drop  in  the  control  group  mean,  from  roughly  five  and  a  half  percent  in  the  first  two  years 
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to  about  slightly  less  than  five  percent  in  Year  3,  along  with  an  increase  in  the  treatment  group  mean, 
fi-om  slightly  less  than  six  percent  to  slightly  less  than  seven  percent.^  Table  III.4  also  shows  that 
there  are  no  apparent  effects  for  the  other  two  types  of  institutional  use  outcomes  (hospital  and  SNF 
same  body  system  diagnosis  admissions). 

F.    ROBUSTNESS  OF  ESTIMATES 

In  light  of  the  relatively  large  number  of  statistically  significant  effects,  some  favoring  the 
treatment  group  and  others  the  control  group,  it  is  important  to  assess  the  extent  to  which  the 
findings  might  be  artifacts  of  the  analytic  approach  before  drawing  any  inferences.  We  first 
considered  whether  differences  in  the  sample  of  agencies  analyzed  might  have  caused  our  current 
estimates  to  differ  fi^om  those  in  the  previous  quality  report,  in  which  we  had  found  no  important 
effects  of  prospective  payment  on  any  of  the  outcomes.  The  previous  report  had  included  86 
agencies  (46  treatment  and  40  control),  whereas,  due  to  agency  dropout  and  failure  to  submit 
complete  QA  data,  the  current  report  includes  only  76  agencies  (40  treatment  and  36  control).  Using 
the  QA  data  fi-om  the  previous  report,  we  restricted  the  number  of  agencies  to  the  smaller  sample  of 
agencies  analyzed  in  this  report,  and  reestimated  the  regressions.  The  results  were  essentially 
unchanged  from  our  previous  results  (except  that  there  were  even  fewer  statistically  significant 
differences  than  before,  due  to  smaller  sample  sizes). 

Next,  we  examined  the  sensitivity  of  our  estimates  to  the  weighting  scheme.  As  explained  in 
Chapter  II,  our  primary  analyses  weighted  observations  inversely  proportional  to  the  size  of  their 
agency,  to  give  each  agency  equal  representation  within  each  demonstration  year  (what  we  call 

"The  control  group  mean  for  Year  3  was  not  significantly  different  from  the  control  group  mean 
in  Year  1  (as  determined  by  a  statistical  test  on  the  coefficient  on  the  demonstration  Year  3  indicator 
variable).  We  did  not,  however,  formally  test  the  hypothesis  that  the  control  group  mean  for  Year  3 
was  significantly  different  from  that  of  Year  2  (by  a  statistical  test  of  whether  the  coefficients  on  the 
Year  3  and  Year  2  variables  were  significantly  different). 
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"equal-weighted"  analyses).  Although  equal-weighted  analyses  are  appropriate,  since  agencies  are 
the  units  of  intervention  and  randomization,  atypical  or  anomalous  observations  from  very  small 
agencies  (with  correspondingly  large  weights)  could  possibly  distort  the  results.  To  check  for  this 
possibility,  we  performed  the  analyses  without  weights.  Since  large  agencies  contribute  more 
observations  than  small  agencies,  unweighted  analyses  implicitly  weight  observations  by  the  size  of 
their  corresponding  agencies  (called  "size-weighted"  analyses). 

Not  all  of  the  treatment-control  differences  in  the  basic  and  instrumental  ADL  outcomes 
previously  significant  in  the  equal-weighted  analyses  remain  so  in  the  size-weighted  analyses 
(Table  III. 5).  We  are  still  left,  however,  with  three  treatment-control  differences  in  which  treatment 
agency  patients  have  worse  outcomes — two  in  BADLs  (grooming  and  bathing)  and  one  in  lADL 
(management  of  medications).  All  three  are  in  stabilization  outcomes,  and  all  three  occur  in  Year 
3. 

Among  the  remaining  outcomes  of  mortality,  clinical  symptoms,  emergency  care,  and  use  of 
Medicare  Part  A  covered  services,  several  treatment-control  differences  remain  significant  under 
both  equal-  and  size-weighting  (Table  III.6).  Most  notably,  10  of  the  13  differences  in  outcomes  of 
symptom  improvement  that  favor  treatment  agencies  are  robust  to  size-weighting,  as  are  all  four  of 
the  differences  in  emergency  care  outcomes  that  also  favor  the  treatment  group.^  However,  the 
increased  rate  of  home  health  admission  for  same  body  system  diagnoses  for  treatment  agency 
patients  in  Year  3  is  not  robust  to  size-weighting.  In  the  following  discussion  we  focus  on  treatment- 
control  difference  that  are  robust  to  size-weighting. 


^Recall  that  use  of  emergency  care  is  considered  an  adverse  outcome,  so  that,  unlike 
improvement  and  stabilization  in  ADLs  and  symptoms,  negative  treatment-control  differences  in 
emergency  care  favor  the  treatment  group. 
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TABLE  III.5 


SUMMARY  TABLE  OF  SENSITIVITY  OF  RESULTS  FOR  IMPROVEMENT  AND  STABILIZATION 
Jl-i  ACTIVITIES  OF  DAILY  LIVING  TO  DIFFERENT  WEIGHTING  METHODS 


Demonstration  Year  1 


Demonstration  Year  2 


Demonstration  Year  3 


Outcome  Equal  Weight"     Size  Weight^     Equal  Weighf     Size  Weight^     Equal  Weight"     Size  Weight^ 


Grooming 

Improvement 
Stabilization 


Bathing 

Improvement 
Stabilization 


Toileting 

Improvement 
Stabilization 


Transferring 
Improvement 
Stabilization 


Ambulation 
Improvement 
Stabilization 


Light  Meal  Preparation 
Improvement 
Stabilization 


Housekeeping 
Improvement 
Stabilization 


Management  of  Medications 

Improvement  0 
Stabilization  0 


+  =         Estimated  treatment  group  value  exceeds  estimated  control  group  value,  p  <  .10. 
-  =        Estimated  treatment  group  value  falls  below  estimated  control  group  value,  p  <  .10. 
0  =        Estimated  difference  between  treatment  and  control  group  values  with  p  >  .10. 

^ "Equal  weight""  refers  to  the  analysis  in  which  observations  are  weighted  to  give  agencies  equal  representation  within  each  year. 

""'Size  weight"  refers  to  the  analysis  in  which  observations  are  unweighted,  which,  in  effect,  gives  agencies  representation  proportional  to 
their  size. 
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TABLE  III.6 

SUMMARY  TABLE  OF  SENSITIVITY  OF  RESULTS  FOR  MORTALITY,  IMPROVEMENT,  AND  STABILIZATION  IN 
CLINICAL  SYMPTOMS,  AND  USE  OF  SELECTED  HEALTH  SERVICES,  TO  DIFFERENT  WEIGHTING  METHODS 


Outcome 

Demonstration  Year  1 

Demonstration  Year  2 

Demonstration  Year  3 

Equal  Weighf 

Size  Weight^ 

Equal  Weight" 

Size  Weight^ 

Equal  Weight" 

Size  Weight^ 

Mortality  Within  120  Days 

0 

0 

0 

0 

0 

0 

Pain 

ImnrovpiTipnt 

+ 

0 

+ 

+ 

+ 

+ 

Stabilization 

0 

0 

0 

0 

0 

0 

N4n<5t  Prnhlpmatir*  Prpcciirf* 

Ulcer 

Improvement 

0 

0 

0 

0 

0 

0 

Stabilization 

0 

0 

0 

- 

- 

0 

Surpical  WoiinH  Statii<: 

Improvement 

. 

0 

0 

0 

0 

0 

Stabilization 

0 

0 

0 

0 

0 

0 

Dyspnea 

Improvement 

+ 

+ 

+ 

0 

0 

0 

Stabilization 

0 

0 

0 

0 

0 

0 

Urinary  Incontinence  or 

Catheter  Present 

Improvement 

0 

+ 

+ 

+ 

+ 

+ 

Stabilization 

+ 

+ 

0 

0 

0 

0 

Confusion 

Improvement 

+ 

+ 

+ 

0 

+ 

+ 

Stahiliyatinn 

0 

0 

0 

0 

0 

0 

Behavior  Problem  Frequency 

Improvement 

A 

u 

+ 

+ 

+ 

+ 

+ 

Stabilization 

0 

0 

0 

0 

0 

0 

Emergency  Care" 

Hospital  Emergency  Room 

0 

0 

0 

0 

Outpatient  Clinic  or 

0 

0 

0 

0 

Urgent  Care  Clinic 

Physician's  Office 

0 

Any  of  the  above 

0 

0 

0 

0 

Admission  to  Medicare  Part  A 

Covered  Services  for  SBSD" 

Admission  to  Hospital 

0 

0 

0 

0 

0 

0 

Admission  to  SNF 

0 

0 

0 

+ 

0 

Admission  to  HHA 

0 

0 

0 

0 

+ 

0 

+  =         Estimated  treatment  group  value  exceeds  estimated  control  group  value,  p  <  .10. 
-  =        Estimated  treatment  group  value  falls  below  estimated  control  group  value,  p  <  .10. 
0  =        Estimated  difference  between  treatment  and  control  group  values  with  p  >  .  10. 

"■'Equal  weight  refers  to  the  analysis  in  which  observations  are  weighted  to  give  agencies  equal  representation  within  each  year. 

'""Size  weight  refers  to  the  analysis  in  which  observations  are  unweighted,  which  in  effect,  gives  agencies  representation  proportional  to 
their  size. 

"  Emergency  care  and  admissions  for  same  body  system  diagnoses  are  considered  adverse  outcomes,  so  that,  unlike  improvement  and 
stabilization  in  ADLs  and  symptoms,  negative  treatment-control  differences  in  these  outcomes  favor  the  treatment  group. 
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G.  DISCUSSION 


The  data  clearly  show  no  obvious  or  large-scale,  adverse  effects  of  prospective  payment  on 
quality  of  care.  There  are  no  deleterious  impacts  on  such  major  outcomes  such  as  mortality,  use  of 
emergency  services,  hospitalization  for  same  body  system  diagnosis,  or  SNF  admission  for  same 
body  system  diagnosis. 

It  is  less  clear  whether  prospective  payment  had  no  effects  at  all  on  the  quality  of  care,  as  there 
are  conflicting  treatment-control  differences.  On  the  one  hand,  patients  of  treatment  agencies  have 
worse  rates  of  stabilization  in  Year  3  in  the  three  ADLs  of  grooming,  bathing,  and  medication 
management.  On  the  other  hand,  patients  of  treatment  agencies  also  have  superior  rates  of 
improvement  in  several  clinical  symptoms. 

A  pessimist  might  see  the  treatment  group's  greater  likelihood  of  worsening  in  the  three  ADLs 
as  a  worrisome  sign  that  the  treatment  agencies'  quality  of  care  is  starting  to  deteriorate.  These  three 
effects  are  robust  to  the  alternative  weighting  scheme,  and  Year  3  is  precisely  when  the  treatment 
agencies,  with  their  steady  service  reductions,  reach  their  lowest  levels  of  service  (Archibald  and 
Cheh  2000). 

On  further  examination,  though,  these  apparent  effects  on  ADL  stabilization  are  less  convincing. 
First,  as  noted  earlier,  they  are  relatively  small  in  magnitude,  ranging  from  two  to  six  percent  of  the 
control  group  mean.  Second,  and  more  important,  it  is  hard  to  reconcile  the  treatment-control 
differences  in  ADL  stabilization  with  the  almost  complete  lack  of  differences  in  ADL  improvement.'^ 

''That  there  are  fewer  statistically  significant  differences  in  ADL  improvements  may  also  be  due 
to  lower  power  for  these  analyses  than  for  the  ADL  stabilization  analyses.  The  improvement 
outcomes  have  smaller  sample  sizes  (the  improvement  samples  exclude  patients  at  the  best  level  of 
an  item,  whereas  the  stabilization  samples  exclude  only  patients  at  the  worst  level  of  an  item)  and 
greater  variability  (the  variance  of  a  binary  variable  is  p(l-p),  and  rates  for  the  improvement 
outcomes  were  around  50  percent  versus  roughly  80  percent  for  the  stabilization  outcomes).  Those 
ADL  improvement  outcomes  that  are  significant,  however — in  grooming,  bathing,  light  meal 

(continued...) 
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In  other  words,  the  prospectively  paid  agencies  would  have  had  to  cut  their  services  and  reallocate 
their  staff  in  such  a  way  as  to  make  their  patients  equally  likely  as  control  agency  patients  to  get 
better  in  function  yet,  simultaneously,  more  likely  to  get  worse.^ 

The  only  way  we  can  imagine  such  a  pattern  of  results  is  if  the  prospectively  paid  agencies 
engaged  in  a  form  of  "triaging."  Treatment  agency  staff  would  have  had  to  concentrate  their  efforts 
on  those  patients  with  a  higher  potential  for  functional  improvement,  at  the  expense  of  those  with 
a  lower  likelihood  of  such  progress.  In  fact,  in  supplemental  analyses  to  investigate  this  possibility, 
we  found  no  evidence  for  such  triaging  behavior.  We  estimated  a  series  of  models  for  the  outcomes 
of  stabilization  in  grooming,  bathing,  toileting,  ambulation,  and  management  of  medications,  in 
which  we  included  as  an  explanatory  variable  an  interaction  term  between  prospective  payment  and 
admission  rehabilitation  prognosis.^    According  to  these  models,  patients  with  a  guarded 


.continued) 

preparation,  and  housekeeping,  all  in  Year  2 — all  favor  the  treatment  group  and  are  thus  in  the 
opposite  direction  from  the  stabilization  results.  None  of  these  significant  differences  in  ADL 
improvement  were  robust  to  size- weighting.  Counting  both  significant  and  nonsignificant  estimates, 
there  are  roughly  as  many  favoring  the  treatment  group  as  favoring  the  control  group. 

^Although  we  discuss  below  how  shortened  episodes  or  reduced  visits  in  the  treatment  group 
might  lead  to  a  measurement  bias  in  the  clinical  symptom  outcomes,  measurement  bias  seems 
unlikely  to  be  operating  in  the  ADL  outcomes.  First,  assuming  patients'  functional  improvement 
is  a  steadily  increasing  function  of  time  (no  dips  or  biphasic  portions),  shorter  episodes  in  the 
treatment  group  with  patients  assessed  earlier  in  recovery  should  lead  to  lower  rates  of  ADL 
improvement  and  similar  rates  of  ADL  stabilization,  which  is  not  what  we  observe.  (Recall  that 
stabilization  includes  both  improvement  and  no  worsening.  In  a  shorter  time  frame,  fewer  people 
may  have  improved;  but  since  they  are  presumably  no  worse  than  baseline,  the  total  number 
stabilized  remains  the  same.)  Second,  we  found  no  evidence  of  such  a  bias  in  the  QA  data  in  our 
previous  report  on  quality  (Chen  2000).  In  that  report,  we  had  data  from  a  survey  that  was 
administered  at  a  fixed  time  point  after  discharge  and  thus  free  from  the  potential  problems  of 
differing  visit  frequencies  or  episode  lengths.  Results  from  the  survey  agreed  with  results  from  the 
QA  data. 

^The  rehabilitation  prognosis  variable  was  an  item  on  the  QA  Start  of  Care  Instrument,  which 
asked  the  agency  nurse  to  judge  at  the  initial  assessment  whether  the  patient's  prognosis  for 
functional  status  was  "good:  marked  improvement  in  functional  status  is  expected"  or  "guarded: 

(continued...) 
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rehabilitation  prognosis  fared  no  differently  under  prospective  payment  than  those  with  a  good 
rehabilitation  prognosis. 

A  related  issue  is  the  lack  of  a  clear  pathway  between  the  known  shifts  in  service  delivery  under 
prospective  payment  with  the  observed  pattern  of  functioning  outcomes.  The  reductions  in  visits 
over  time  among  both  treatment  and  control  agencies  were  primarily  in  skilled  nursing  and  home 
health  aide  visits.  Although  the  yearly  reductions  in  visits  in  both  groups  were  of  roughly  the  same 
size,  so  that  the  treatment-control  differences  in  numbers  of  visits  actually  remained  fairly  constant 
over  time,  treatment  agencies  always  provided  fewer  visits  than  control  agencies.  In  contrast,  the 
number  of  therapy  visits  of  all  types  (physical,  occupational,  and  speech)  provided  by  either  group 
of  agencies  did  not  change  significantly  over  the  three  years  (Archibald  and  Cheh  2000).  As 
discussed  above,  we  can  see  how  decreased  nursing  visits  can  bias  symptom  improvement  rates 
upward,  but  the  reduced  reinforcement  of  therapists'  instructions  from  the  increasingly  fewer  visits 
by  aides  and  nurses  of  the  treatment  agencies  should  manifest  in  lowered  rates  of  both  improvement 
and  stabilization,  not  just  stabilization  alone. 

In  the  end,  we  are  doubtftil  that  the  apparent  adverse  effects  of  prospective  payment  on  ADL 
stabilization  signify  true  impacts  on  quality  of  care.  We  can  speculate  that  the  apparent  effects 
represent  chance  findings,  but  our  conclusion  is  that  they  constitute,  at  best,  only  weak  and 
inconsistent  evidence  for  any  decline  in  quality  in  prospectively  paid  agencies. 

On  the  other  hand,  an  optimist  might  focus  on  the  treatment  group's  better  rates  of  improvement 
for  several  of  the  clinical  symptoms  and  conclude  that  prospectively  paid  agencies  provided  superior 


^(...continued) 

minimal  improvement  in  functional  status  is  expected;  decline  is  possible."  We  did  not  include  this 
variable  as  a  control  variable  in  the  main  analyses  because  of  the  possibility  of  endogeneity. 
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care.  There  are  strong  reasons  to  doubt  this  conclusion,  however.  The  apparent  greater  rates  of 
symptom  improvement  among  the  treatment  group  could  well  be  the  result  of  biases  in  the 
assessment  of  symptoms,  and  not  from  any  real  improvements  in  care. 

One  way  the  measurement  of  symptoms  could  become  biased  is  through  treatment-control 
differences  in  the  extent  of  patient  and  staff  contact.  Perhaps  the  resource-conscious  treatment 
agency  staff  found  it  easy  to  reduce  visits  and  episode  lengths  among  patients  likely  to  improve  or 
remain  stable  in  clinical  symptoms,  but  were  less  able  or  unwilling  to  do  so  among  patients  with 
unstable  or  deteriorating  symptoms.  This  differential  reduction  in  the  amount  of  contact  by  staff 
could  then  lead  to  a  selective  inaccuracy  in  the  follow-up  assessments  of  patients'  symptoms.  The 
symptom  questions  ask  nurses  to  report  the  frequency  of  patients'  symptoms,  with  response 
categories  such  as  "never"  to  "all  of  the  time."^  Because  patients'  symptoms  tend  to  be  intermittent, 
a  reduction  in  nurses'  contact  with  patients  might  well  lead  nurses  to  underestimate  the  frequency 
of  symptoms  and  thus  rate  symptoms  as  better  than  they  actually  are,  causing  a  proportion  of  stable 
patients  to  be  misclassified  as  "improved."  (Patients  who  do  improve  would  still  be  classified  as 
"improved.")  However,  because  we  have  postulated  the  reduction  in  contact  to  be  selective,  the 
patients  with  whom  agency  staff  have  less  contact  are  precisely  those  whose  symptoms  tend  either 
to  improve  or  to  remain  unchanged  from  baseline.  Patients  whose  symptoms  actually  worsened  from 
baseline,  in  contrast,  would  be  the  patients  with  whom  agency  staff  had  greater  contact;  so  staff 
would  tend  to  report  their  symptoms  accurately  as  "worsened."  Thus,  the  treatment  agency  staffs 
overly  optimistic  assessments  of  patients  with  stable  or  improved  symptoms — yet  accurate 
assessment  of  patients  with  worsened  symptoms — would  shift  some  of  their  truly  stable  patients  into 
the  improved  category,  thus  giving  rise  to  a  spurious  treatment  group  superiority  in  symptom 

^The  questions  on  symptoms  contrast  with  the  questions  on  ADLs  that  require  yes/no 
determinations  of  patients'  ability  to  perform  the  ADLs  under  various  circumstances. 
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improvement,  with  no  treatment-control  differences  in  symptom  stabilization  (recall  that  stable 
patients  are  those  whose  symptoms  are  either  unchanged  or  improved).^ 

Another  possible  mechanism  for  measurements  to  become  biased  are  unconscious  efforts  by 
treatment  agency  nurses  to  reduce  "cognitive  dissonance."  Cognitive  dissonance  is  "the  distressing 
mental  state  in  which  people  find  themselves  doing  things  that  don't  fit  with  what  they  know,  or 
having  opinions  that  do  not  fit  with  other  opinions  they  hold"  (Festinger  1957).  One  way  people 
attempt  to  reduce  such  distress  is  by  modifying  dissonant  knowledge  or  perceptions  to  become  more 
"consonant."  Patients  with  improved  or  stable  symptoms  would  likely  be  those  who  were  discharged 
earlier.  Finding  these  rapid  discharges  unfamiliar  or  uncomfortable,  nurses  of  prospectively  paid 
agencies  might  unconsciously  justify  these  discharges  to  themselves  by  seeing  more  improvement 
than  there  actually  was,  especially  in  the  symptom  outcomes  which,  as  we  have  suggested,  may  be 
more  prone  to  subjectivity  than  the  ADL  outcomes.^  The  effect  of  such  bias  would  again  be  to  shift 
some  patients  from  the  stable  category  into  the  improved  category,  causing  inflated  improvement 
rates  for  the  treatment  group  with  no  changes  in  stabilization  rates. 

In  light  of  the  preceding  discussion,  the  most  reasonable  interpretation  of  the  conflicting  results 
may  ultimately  be  that  prospective  payment  has  no  real  effects  on  quality  of  care.  We  discussed 
above  how  the  treatment  group's  isolated  lower  rates  of  ADL  stabilization  provide  little  evidence 
for  declining  quality  of  care,  as  well  as  how  the  treatment  group's  higher  rates  of  symptom 

^We  considered  controlling  for  episode  length  and  visit  frequency  in  the  regressions  but  did  not 
do  so  because  inclusion  of  these  two  likely  endogenous  variables  might  run  the  risk  of  producing 
biased  estimates  of  the  treatment  effect. 

'Preliminary  kappa  statistics  suggest  that  the  QA  behavioral  measures  tend  to  be  somewhat  less 
reliable  than  the  ADL  measures.  Kappas  for  some  of  the  behavioral  measures  are:  pain,  0.55;  urinary 
incontinence  or  catheter  present,  0.77;  confusion  0.62;  and  behavior  problem  frequency,  0.37,  while 
kappas  for  the  ADLs  are:  grooming,  0,63;  bathing,  0.68;  toileting,  0.82;  transferring,  0.76; 
ambulation,  0.77;  light  meal  preparation,  0.58;  housekeeping,  0.50;  and  management  of  medications, 
0.73  (Abt  Associates  1999). 
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improvement  provide  scant  support  for  improved  quality  of  care.  Furthermore,  no  straightforward 
mechanism  of  service  reductions  by  the  prospecitvely  paid  agencies  appears  capable  of  easily 
encompassing  both  the  ADL  stabilization  and  the  symptom  improvement  results. 

We  suggest  that  perhaps  the  most  prudent  stance  to  adopt  in  the  face  of  these  results  is  one  of 
heavily  guarded  optimism.  On  the  one  hand,  we  feel  somewhat  reassured  by  the  absence  of  adverse 
impacts  on  serious  outcomes,  and  the  lack  of  clear-cut  evidence  for  detrimental  impacts  on  lesser 
outcomes;  but  on  the  other,  the  ambiguous  results  and  hints  of  possible  negative  effects  should 
concern  us.  Clearly,  then,  continued  close  monitoring  of  the  quality  of  care  under  prospective 
payment  is  warranted. 

H.  SUBGROUP  ANALYSIS  OF  HIGH-  AND  LOW-USE  AGENCIES 

As  described  in  Chapter  II,  we  performed  subgroup  analyses  on  selected  quality  of  care  outcome 
variables  for  high-  and  low-use  agencies,  because  Archibald  and  Cheh  (2000)  had  shown  a  striking 
time  trend  in  differences  in  impacts  between  these  two  subgroups. '°  Although  the  Year  1  impacts 
on  service  use  of  prospective  payment  were  much  larger  among  high-use  agencies  than  among  low- 
use  agencies,  by  Year  3  the  differences  in  impacts  had  shrunk  dramatically,  primarily  because  the 
low-use  treatment  agencies  had  "caught  up"  to  the  high-use  treatment  agencies  by  making 
progressively  larger  cuts  in  the  number  of  visits.  In  contrast,  service  use  among  the  remaining 
subgroups  (high-use  treatment  agencies,  and  high-  and  low-  use  control  agencies)  changed  relatively 
little  over  time. 

Thus,  to  the  extent  that  quality  of  home  health  care  is  a  function  of  the  number  of  visits,  we 
would  expect  impacts  on  quality  of  care  to  appear  among  the  low-use  subgroup.  The  low-use 

'°We  also  performed  subgroup  analyses  on  the  for-profit  versus  not-for-profit  agency  subgroup, 
and  the  small-  versus  large-agency  subgroups.  Neither  set  of  analyses  revealed  any  significant 
differences  in  any  of  the  outcomes. 
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agencies,  by  definition,  were  already  starting  at  a  lower  baseline  of  service  use  than  the  high-use 
agencies  (roughly  32  visits  during  the  at-risk  period  in  demonstration  Year  1  for  low-use  control 
agencies,  versus  53  visits  for  high-use  control  agencies).  The  low-use  treatment  agencies  were  then 
the  most  active  in  cutting  in  visits  over  time,  finishing  the  study  with  the  lowest  levels  of  service  of 
any  of  the  subgroups  (down  to  22  visits  during  the  at-risk  period  by  Year  3,  versus  29  for  low-use 
control  agencies,  36  for  high-use  treatment  agencies,  and  47  for  high-use  control  agencies). 

It  is  surprising,  then,  that  there  was  little  evidence  for  any  effects  in  the  low-use  subgroup. 
Instead,  most  of  the  treatment-control  differences  appeared  in  the  high-use  agency  subgroup 
(Tables  III. 7  through  III. 9).  Specifically,  treatment  high-use  agencies  had  a  few  significantly,  but 
only  slightly,  worse  rates  than  control  high-use  agencies  in  the  stabilization  of  ADLs  in  Year  3,  and 
a  larger  number  of  better  rates  of  improvement  of  clinical  symptoms  over  one  to  two  years. 

There  appeared  to  be  no  true  effects  on  ADLs  in  Years  1  and  2  because  of  the  small  number  and 
inconsistent  direction  of  the  estimated  differences.  Table  III. 7  shows  that  there  are  two  statistically 
significant  subgroup  differences  in  ADLs  in  the  first  year:  in  improvement  in  toileting  (with  the 
treatment  group  having  a  better  rate  of  improvement  in  the  high-use  subgroup  versus  a  lower  rate 
of  improvement  in  the  low-use  subgroup),  and  in  stabilization  in  management  of  medications  (this 
time  with  the  treatment  group  having  worse  outcomes  in  the  high-use  subgroup  and  better  outcomes 
in  the  low-use  subgroup).  In  Year  2,  there  are  three  significant  subgroup  differences:  (1)  in 
improvement  in  grooming  (with  the  treatment  group  doing  significantly  better  in  the  low-use 
subgroup  and  non-significantly  better  in  the  high-use  subgroup),  (2)  in  stabilization  in  grooming 
(with  the  treatment  group  doing  significantly  worse  in  the  high-use  subgroup  and  significantly  better 
in  the  low-use  subgroup),  and  (3)  stabilization  in  toileting  (with  the  treatment  group  doing 
significantly  worse  in  the  high-use  subgroup  and  non-significantly  better  in  the  low-use  subgroup). 
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In  the  third  year,  however,  five  subgroup  differences  in  ADL  measures  appear — all  in 
stabilization  outcomes,  and  all  with  worse  outcomes  among  treatment  patients  in  the  high-use 
subgroup,  compared  to  negligible  treatment  effects  in  the  low-use  subgroup.  The  differences  are  in 
stabilization  in  the  following  BADLs  and  lADLs:  (1)  grooming,  (2)  bathing,  (3)  toileting,  (4) 
ambulation,  and  (5)  management  of  medications.  Thus,  the  apparent  small  unfavorable  effects  of 
prospective  payment  on  ADL  stabilization  in  Year  3  seen  earlier  in  the  main  effects  analysis  (Tables 
III.l  and  III. 5)  are  primarily  due  to  the  high-use  agency  subgroup. 

As  for  clinical  symptom  measures,  seven  of  the  nine  significant  subgroup  differences  are  in  the 
improvement  outcomes  (Table  III. 8).  These  seven  differences  are  in  improvement  in:  (1)  pain  (in 
Year  3),  (2)  surgical  wound  status  (in  Year  3),  (3)  dyspnea  (in  Year  2),  (4)  dyspnea  (in  Year  3),  (5) 
urinary  incontinence  or  catheter  present  (in  Year  2),  (6)  confusion  (in  Year  1),  and  (7)  confusion  (in 
Year  3).  Moreover,  these  subgroup  effects  are  due  mainly  to  better  treatment  group  outcomes  for 
these  seven  symptom  improvement  outcomes  in  the  high-use  subgroup.  Treatment-control 
differences  in  the  low-use  subgroup,  in  contrast,  were  negligible  (or,  in  the  one  case  of  surgical 
wound  status  improvement,  marginally  worse  for  the  treatment  group).  The  apparent  favorable 
effects  of  prospective  payment  on  significant  improvement  in  the  main  analysis  (Tables  III. 3  and 
III. 6)  thus  also  stem  from  effects  in  the  high-use  agency  subgroup. 

Finally,  there  are  no  significant  subgroup  effects  among  the  outcomes  of  admissions  to  hospitals 
or  home  health  agencies  for  same  body  system  diagnosis  (Table  III.9).  Treatment  low-use  agencies 
appear  to  have  higher  rates  of  same  body  system  diagnosis  admissions  to  home  care  in  Years  1  and  3, 
but  the  subgroup  effect  itself,  that  is  the  difference  between  the  treatment-control  differences,  does 
not  reach  statistical  significance. 
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These  results  do  not  lead  us  to  conclude  that  prospective  payment  affected  high-  or  low-use 
agencies  differently  over  time.  As  pointed  out  earlier,  the  effects  cluster  within  the  high-use  agencies 
and  parallel  those  in  the  main  effects  analysis;  that  is,  they  have  the  same  inconsistencies  in  direction 
(worse  for  the  treatment  group  in  ADLs  but  better  in  clinical  symptoms)  and  range  (stabilization,  but 
not  improvement,  for  ADLs;  improvement,  but  not  stabilization,  for  clinical  symptoms).  For  the 
same  reasons  that  we  were  skeptical  of  the  validity  of  the  results  of  the  main  analysis,  we  likewise 
do  not  consider  these  subgroup  results  to  be  strong  evidence  for  any  true  subgroup  impacts. 

Also,  as  in  the  main  analysis,  it  is  hard  to  tie  the  changes  in  service  use  with  the  effects  on  both 
clinical  symptom  improvement  and  stabilization  in  ADLs.  Among  the  high-use  agencies,  the  Year 
1  to  Year  3  reductions  in  visits  (expressed  as  a  proportion  of  the  Year  1  mean)  were  substantially 
larger  for  nurses  among  the  treatment  agencies,  slightly  larger  for  home  health  aides  among  the 
control  agencies,  and  negligible  for  both  groups  of  agencies  for  physical  therapists.  Although,  as 
we  have  discussed,  reductions  in  nursing  visits  could  potentially  lead  to  measurement  bias  and 
inflated  rates  of  symptom  improvement,  it  is  again  unclear  how  reductions  in  either  nursing  or  aide 
visits  could  lead  to  lower  stabilization  rates  (more  frequent  worsening)  in  ADLs  among  the  treatment 
agencies  without  any  effect  on  rates  of  ADL  improvement.  Perhaps  the  most  important  lesson  is  that 
those  monitoring  quality  under  prospective  payment  should  be  especially  aware  of  the  potential  for 
differential  impacts  on  high-  and  low-use  agencies. 

The  data  do  suggest  that  patient  outcomes  are  not,  in  fact,  directly  linked  to  the  number  of  visits. 
Our  initial  supposition  was  that  subgroup  effects  would  most  likely  appear  among  low-use  agencies, 
because  of  low-use  treatment  agencies'  aggressive  cutting  of  visits  from  already  low,  prevailing  visit 
levels.  Contrary  to  this  supposition,  however,  most  of  the  subgroup  effects  appeared  among  the 
high-use  agencies.  Even  if  there  really  are  no  subgroup  impacts,  the  absence  of  any  differences  in 
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patient  outcomes  despite  widely  varying  levels  of  total  visits  and  of  different  types  of  visits  still 
suggests  that  patient  outcomes  depend  more  on  the  content  of  home  health  care  visits  than  on  the 
absolute  number  of  visits. 
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IV.  CONCLUSIONS 


In  contrast  to  our  previous  report  on  quality,  which  provided  strong  evidence  that  quality  of  care 
was  maintained  under  prospective  payment,  the  results  presented  here  are  not  as  clear-cut.  Although 
we  can  safely  rule  out  any  large-scale  deterioration  in  quality  of  care,  there  are  several  treatment- 
control  differences  that  are  somewhat  difficult  to  interpret. 

A.   SUMMARY  OF  KEY  FINDINGS 

There  were  no  deleterious  impacts  on  such  major  outcomes  as  mortality,  emergency  care,  or 
hospitalization  for  same  body  system  diagnoses.  In  the  fiinctioning  and  clinical  symptoms  measures, 
the  findings  were  mixed.  In  the  third  year  of  the  demonstration,  patients  of  prospectively  paid 
agencies  appeared  slightly  less  likely  to  stabilize  in  a  few  of  the  ADLs.  However,  patients  of 
prospectively  paid  agencies  had  significantly  and  substantially  superior  rates  of  improvement  in 
numerous  clinical  symptoms  throughout  all  three  demonstration  years.  Otherwise,  there  were  no 
effects  on  any  of  the  other  health  outcomes  (improvement  in  ADLs  and  stabilization  in  clinical 
symptoms). 

We  considered  both  pessimistic  and  optimistic  interpretations  for  these  inconsistent  findings. 
The  pessimistic  interpretation  views  the  few  small  negative  treatment-control  differences  in  ADL 
stabilization  in  Year  3  as  early  signs  of  trouble  and  discounts  the  positive  differences  in  clinical 
symptoms  as  artifacts  from  biased  measurement  (due  to  the  shortened  episodes  and  reduced  visits 
among  treatment  agencies).  On  the  other  hand,  the  optimistic  interpretation,  which  seems  equally 
likely,  if  not  more  so,  sees  the  mixed  findings  as  evidence  of  a  probable  absence  of  true  impacts. 
Furthermore,  the  negative  effects  in  ADL  stabilization  contradict  the  absence  of  effects  on  ADL 
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improvement,'  just  as  the  positive  effects  in  clinical  symptom  improvement  contradict  the  absence 
of  effects  on  clinical  symptom  stabilization.  Even  measurement  bias  cannot  explain  this  mixed 
pattern  of  results.  Finally,  there  is  no  plausible  link  between  the  pattern  of  fiinctioning  and  symptom 
outcomes  and  the  observed  impacts  on  service  delivery. 

The  most  prudent  conclusion  seems  to  be  one  of  cautious  optimism.  The  absence  of  adverse 
impacts  on  serious  outcomes  and  the  conflicting  results  on  the  lesser  outcomes  are  reassuring.  The 
ambiguity  of  the  findings  and  hints  of  possible  negative  effects  should,  however,  serve  as  reminders 
of  the  importance  of  continued  close  monitoring  of  the  quality  of  care. 

Given  the  relatively  larger  impact  of  prospective  payment  on  service  use  for  low-use  agencies 
found  by  Archibald  and  Cheh  (2000),  we  also  studied  the  high-  and  low-use  agency  subgroups  for 
quality  of  care.  In  fact,  we  found  no  effects  among  the  low-use  agencies,  and  an  inconsistent  mixture 
of  effects  among  the  high-use  agencies  that  resembled  those  of  the  main  analysis.  We  thus 
concluded  that  there  was  no  firm  evidence  that  prospective  payment  affected  high-  or  low-  use 
agencies  differently  over  time.  The  most  prudent  course,  however,  may  be  to  pay  more  attention  to 
this  subgroup  of  agencies  in  the  monitoring  of  care  quality. 

B.   STUDY  LIMITATIONS  AND  STRENGTHS 

Our  study  possesses  numerous  strengths.  The  random  assignment  of  agencies  and  the  relatively 
large  number  of  patient-,  agency-,  and  area-level  control  variables  greatly  reduce  the  possibility  of 


'In  fact,  we  explored  whether  treatment  agencies  might  be  concentrating  their  efforts  on  patients 
with  a  higher  potential  for  functional  improvement  at  the  expense  of  those  with  a  lower  likelihood, 
and  found  no  evidence  to  support  such  a  hypothesis. 
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bias.  Sample  sizes  for  nearly  all  outcomes  are  large,  allowing  statistical  power  to  detect  even 
moderately  small  differences.^  There  are  a  number  of  limitations,  however. 

First,  agency  participation  was  voluntary,  possibly  making  the  results  less  applicable  to  a 
nationwide  program.  Agencies  volunteering  for  the  demonstration  may  have  differed  from  typical 
agencies  in  a  wide  range  of  unmeasured  characteristics  that  affected  their  capacity  both  to  reduce 
services  and  to  maintain  quality  of  care.  Indeed,  the  agencies  in  the  demonstration  were  more  likely 
to  be  large,  full-service  agencies,  and  were  less  likely  to  be  hospital-based.  There  was,  however,  a 
broad  range  of  characteristics  represented  among  the  sample  of  agencies.  We  also  investigated 
potential  subgroup  effects  for  two  agency  characteristics  other  than  use  pattern  (for-profit  status  and 
size),  and  found  none.  These  last  two  points  suggest  that  our  results  are  relevant  for  a  national 
program. 

Second,  the  newly  implemented  national  home  health  prospective  payment  system  differs 
substantially  from  the  one  in  the  demonstration;  thus,  it  may  lead  to  substantially  different  agency 
responses.  The  new  system,  for  example,  does  not  base  agencies'  payments  on  their  prior  costs  per 
episode,  nor  does  it  provide  loss  protection.  These  factors  could  prompt  agencies  to  cut  service  use 
even  more  aggressively  than  they  did  in  the  demonstration,  with  potentially  deleterious 
consequences  for  the  quality  of  care. 

A  final  caution  is  that  the  time-limited  nature  of  the  demonstration  may  cause  effects  to  vary 
from  those  resulting  from  a  permanent  policy  change.  Agencies  may  respond  to  payment  incentives 
that  they  view  as  temporary  differently  from  the  way  they  respond  to  incentives  viewed  as 

"Even  the  outcome  with  the  smallest  sample  size,  improvement  in  most  problematic  pressure 
ulcer,  has  annual  sample  sizes  ranging  from  roughly  1,500  to  2,500.  Recall  that  the  sample  size  for 
each  outcome  varies  with  the  number  of  patients  "eligible"  either  to  stabilize  or  to  improve  in  that 
health  measure. 
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permanent.  Furthermore,  the  duration  of  follow-up  time  on  the  agencies  is  limited,  with  some 
impacts  on  quality  of  care  not  apparent  until  later  on.  The  small  negative  effects  on  stabilization  in 
ADLs,  for  example,  did  not  appear  until  Year  3. 

C.   POLICY  IMPLICATIONS 

Despite  the  less  than  clear-cut  results  from  this  study,  we  can  still  infer  that  (1)  considerable 
reductions  in  service  use  are  possible  without  large  decrements  in  quality  of  care,  and  (2)  there  is 
indeed  a  substantial  margin  of  safety.  The  results  presented  here,  along  with  those  in  Archibald  and 
Cheh  (2000),  show  that  service  provision  fell  steadily  throughout  the  first  two  years  of  the 
demonstration  with  no  adverse  effects  on  a  wide  variety  of  outcomes.  Not  until  Year  3  does  a  hint 
of  a  potential  negative  effect  of  prospective  payment  appear,  consisting  of  a  few  small  negative 
treatment-control  differences  in  stabilization  of  ADLs,  and  they  are  counterbalanced  by  several 
sizable,  positive  effects  of  prospective  payment  on  improvement  in  clinical  symptoms  in  all  three 
years.  As  we  have  discussed,  these  conflicting  results  represent,  at  worst,  subtle  negative  effects, 
but  could  as  well  be  viewed  as  an  absence  of  true  impacts. 

The  study  results  provide  a  cautionary  reminder.  While  the  few  small  negative  effects  on  ADL 
stabilization  in  Year  3  may  or  may  not  represent  true  impacts  of  prospective  payment,  they 
nevertheless  should  not  be  dismissed.  The  suggestions,  though  uncertain,  of  possible  negative 
effects  on  beneficiaries'  health  underscore  the  importance  of  HCFA's  efforts  to  implement  ongoing 
programs  of  quality  assurance  under  the  new  home  health  prospective  payment  system. 
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